There are countless reasons why employees resign or retire early. Reasons may include some dissatisfaction with employers or some concerns over family and health. Whatever reason it may be for an employee to resign, it is necessary for employers to understand the causes and be able to reduce workers exit or resignation. Surveys and staff feedback provide employers with valuable information on the reasons why their employees resign or retire. The information is used to inform attraction and retention initiatives and to improve work practices across an organization to ensure the organization is considered an employer of choice.
In this project, our goal is to provide answers to the project questions above.
In this project, we'll work with two datasets:
Both datasets originated from departments in Queensland, Australia. The DETE and TAFE Exit Surveys were developed to effectively canvass the opinions and attitudes of departing employees to identify a wide range of operational, organizational and personal variables affecting the decision to leave.
Data Dictionary
A dictionary wasn't provided with the datasets. However, for this project, we'll use our general knowledge to define some of the columns we're going to work with. We'll provide another combined dictionary later as we move forward.
dete_survey.csv
Column | Definition |
---|---|
ID | An id used to identify the participant of the survey |
SeparationType | The reason why the person's employment ended |
Cease Date | The year or month the person's employment ended |
DETE Start Date | The year the person began employment with the DETE |
tafe_survey.csv
Column | Definition |
---|---|
Record ID | An id used to identify the participant of the survey |
Reason for ceasing employment | The reason why the person's employment ended |
LengthofServiceOverall. Overall Legth of Service at Institute (in years) | The length of the person's employment (in years). |
In this project, it was found that the total number of workers who resigned due to some kinds of job dissatisfaction totalled 240. Out of this number, we found that, compared to employees with less than 7 years of service, more employees who have spent 7 or more years at a job (Established and Veteran categories) resigned due to some kinds of job dissatisfaction (with more than 60% of the total number of dissatisfied employees).
Further into the analysis, we noticed that more workers resigned as they increase in age. We observed a progressive increase in the percentage of workers that resigned due to job dissatisfaction as they move from their 30s to 40s etc. We found the highest percentage of dissatisfied workers among those who are 50 years of age and above. However, we found that more workers in their 20s and below resigned due to job dissatisfaction compared to those in their 30s.
# Import needed libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# Read-in datasets with pandas
dete_survey = pd.read_csv('dete_survey.csv', encoding="Latin-1")
tafe_survey = pd.read_csv('tafe_survey.csv', encoding="Latin-1")
The first action to carry out is to get an overview of what the two datasets look like. These initial and somewhat simple actions will direct our steps into the carrying out proper data cleaning and eventually, data analysis. We'll familiarize ourselves with the two datasets using some important methods. At the end, we'll have adequate knowledge to guide us as we move forward with the analysis and towards the set goals.
# This line of code helps to display full columns in the output
pd.options.display.max_columns = 150
# Familiarizing ourselves with the DETE dataset
dete_survey.info()
dete_survey.head()
# Checking for unique entries in the column
dete_survey['SeparationType'].value_counts()
# Getting the number of null values in the dataframe
dete_survey.isnull().sum()
# Familiarizing ourselves with the TAFE dataset
tafe_survey.info()
tafe_survey.head()
# Checking for unique entries in the column
tafe_survey['Reason for ceasing employment'].value_counts()
# Getting the number of null values in the dataframe
tafe_survey.isnull().sum()
OBSERVATIONS
As we can see from the information above, the DETE
dataframe contains 56 columns and 822 rows of data. We also see that DETE Start Date
column in dete_survey
dataframe contains some missing/null values entered as Not Stated. This is recognized by pandas as a string and not a missing value.
The TAFE
dataframe also contains 72 columns and 702 rows of data. We also see that LengthofServiceOverall. Overall Length of Service at Institute (in years)
column in tafe_survey
dataframe contains some missing/null values seen as NaN
.
The two dataframes contain many of the same columns, but the column names are different.
Now we know that the dataframes contain many columns that are not needed to complete this project. Also, there are multiple columns/answers that indicate an employee resigned because they were dissatisfied.
Hence, before we proceed further with the data exploration, we'll read dete_survey.csv
CSV file into pandas again. This time around, the Not Stated
values will be specified as NAN
. We'll thereafter proceed to remove the unwanted columns.
# Makes pandas to identify specified missing values
dete_survey = pd.read_csv('dete_survey.csv', na_values='Not Stated', encoding="Latin-1")
# Removes unwanted columns from each dataframes
dete_survey_updated = dete_survey.drop(dete_survey.columns[28:49], axis=1)
tafe_survey_updated = tafe_survey.drop(tafe_survey.columns[17:66], axis=1)
# Checks that the columns were removed
print(dete_survey_updated.columns)
print(tafe_survey_updated.columns)
From the new information above, we see that the dete_survey_updated
dataframe now contains 35 columns and 822 rows. We also see that missing/null values are now seen as NAN
. The tafe_survey_updated
dataframe now contains 23 columns and 702 rows.
Now that we have successfully made these changes, we'll get back on track to our data exploration and cleaning.
As stated earlier in our observation, each dataframe contains many of the same columns, but the column names are different. Below is a combined dictionary for some similar columns in the two datasets:
dete_survey | tafe_survey | Definition |
---|---|---|
ID | Record ID | An id used to identify the participant of the survey |
SeparationType | Reason for ceasing employment | The reason why the participant's employment ended |
Cease Date | CESSATION YEAR | The year or month the participant's employment ended |
DETE Start Date | The year the participant began employment with DETE | |
LengthofServiceOverall. Overall Length of Service at Institute (in years) | The length of the person's employment (in years) | |
Age | CurrentAge. Current Age | The age of the participant |
Gender | Gender. What is your Gender? | The gender of the participant |
We'll standardize the column names in both dataframes and make them uniform. This is because we'll later combine the two dataframes.
In some few cells below, we'll see the state of the column names before and after the cleaning process that was carried out. We do this in order to maintain the flow of the whole process.
# Initial form of the column names
print(dete_survey_updated.columns)
# Column head: BEFORE CLEANING
dete_survey_updated.head(2)
# Updates the column names to a standardized form
dete_survey_updated.columns = dete_survey_updated.columns.str.replace(' ', '_').str.strip().str.lower()
# Updated form of the column names
print(dete_survey_updated.columns)
# Column head: AFTER CLEANING
dete_survey_updated.head(2)
In the following cells, we'll update the column names in tafe_survey_updated
dataframe to be uniform with the dete_survey_updated
dataframe.
# A dictionary of name change
updated_names = {"Record ID": "id", "CESSATION YEAR": "cease_date", "Reason for ceasing employment": "separationtype", "Gender. What is your Gender?": "gender", "CurrentAge. Current Age": "age", "Employment Type. Employment Type": "employment_status", "Classification. Classification": "position", "LengthofServiceOverall. Overall Length of Service at Institute (in years)": "institute_service", "LengthofServiceCurrent. Length of Service at current workplace (in years)": "role_service" }
# Updates the column names in dataframe
tafe_survey_updated = tafe_survey_updated.rename(updated_names, axis=1)
tafe_survey_updated.columns
We can see that the names of the columns representing the same information in both dataframes are now uniform. Thus, our analysis is made less confusing. This is especially important because we aim to combine both dataframes to complete our analysis.
Next, we'll remove more of the data we don't need to answer our questions - one of which is:
Again, let's take a look at the unique values in separationtype
columns in each dataframe:
dete_survey_updated['separationtype'].value_counts()
tafe_survey_updated['separationtype'].value_counts()
We can see that each dataframe contains a couple of different separation types. However, for this particular analysis, we're only interested in survey respondents who resigned. Therefore, only those separation types that contains the string Resignation
will be analyzed.
We can also see that dete_survey_updated
dataframe contains multiple separation types with the string Resignation
:
# The pattern to search for and select
resignations = dete_survey_updated['separationtype'].str.contains(r"[Rr]esignation", na=False)
# Selects only data that have the desired string
dete_resignations = dete_survey_updated.copy()[resignations]
dete_resignations.head()
# Separates and select Resignation from the strings
dete_resignations['separationtype'] = dete_survey_updated['separationtype'].str.split('-').str[0]
# Unique values in the updated column
dete_resignations['separationtype'].value_counts()
Above, we select only data whose separationtype
column contains the string - "Resignation". This is solely because we're only going to analyze the data of the respondents who resigned from their job.
We can see above that the dataframe dete_resignations
now contains 311 rows of data.
The separationtype
column in tafe_survey_updated
dataframe has only one type of resignation string. Next, we'll select those data with Resignation
directly into a new dataframe.
# Selects only data which have the desired string
tafe_resignations = tafe_survey_updated[tafe_survey_updated['separationtype'] == 'Resignation'].copy()
tafe_resignations.head()
# Unique values in the updated column
tafe_resignations['separationtype'].value_counts()
After selecting just the rows that contains Resignation
in their separationtype
column, we can see above that the dataframe tafe_resignations
now contains 340 rows of data.
We'll continue by checking for other errors in the dataframes. We'll do this by examining each column we need for this anaysis:
Let's take a look at the cease_date
and dete_start_date
in the dete_resignations
dataframe and see whether there's need for any adjustments based on these:
cease_date
is the last year of the person's employment and the dete_start_date
is the person's first year of employment, it wouldn't make sense to have years after the current date.dete_start_date
was before the year 1940.So if we have years (in dete_start_date
) higher than the cease_date
or lower than 1940
, we wouldn't want to continue with the analysis. The reason is that there may be something very wrong with the data. However, if there are just a small amount of values that are high or low, we'll remove them.
dete_resignations['cease_date'].value_counts(ascending=True)
dete_resignations['dete_start_date'].value_counts().sort_index(ascending=True)
The cease_date
column in dete_resignation
dataframe contains two different date formats: some only contain year while others contain month and year. There is a need to change these dates to take a single and similar form across the rows.
# Pattern for extracting date format
pattern = r"/?(?P<Year>[1-2][0-9]{3})"
dete_resignations['cease_date'] = dete_resignations['cease_date'].astype(str).str.extract(pattern).astype(float)
dete_resignations['cease_date'].value_counts().sort_index()
# Box plot to check for outliers
plt.figure(figsize=(10, 10))
dete_resignations.boxplot(column=['cease_date']).set_ylim(2005,2018)
# Box plot to check for outliers
plt.figure(figsize=(10, 10))
dete_resignations.boxplot(column=['dete_start_date']).set_ylim(1960,2014)
# Sorted values in the cease_date column
tafe_resignations['cease_date'].value_counts().sort_index()
# Box plot to check for outliers
plt.figure(figsize=(10, 10))
tafe_resignations.boxplot(column=['cease_date']).set_ylim(2005,2018)
Here is what we can deduce from the information above:
dete_start_date
and cease_date
.dete_resignations
dataframe contains outliers (2006 and 2010) while tafe_resignations
has outliers at 2009 (a year absent from the other dataframe). The tafe_resignations
dataframe also contains more cease dates (68) in 2010 compared to the number of cease dates in dete_resignations
(2).Since we're not interested in analyzing the data according to year, we'll leave them as they are.
So far so good, we have rectified the years in the columns to follow a single uniform pattern in the two dataframes. We also didn't find any major issues with the years. Additionally, we saw that the years in each dataframe don't span the same number of years.
As a reminder, our goal is to answer this question:
In order to answer the questions, we'll use the now verified years in the dete_registrations
dataframe to create a new column. In the Human Resources field, the length of time an employee spent in a workplace is referred to as their years of service. The tafe_resignations
dataframe already contains a "service" column that we renamed to institute_service
. Therefore, in order to analyze both surveys together, we'll create a corresponding institute_service
column in dete_resignations
.
dete_resignations['institute_service'] = dete_resignations['cease_date'] - dete_resignations['dete_start_date']
dete_resignations['institute_service'].head()
We have successfully created a new corresponding column named institute_service
in the dete_resignations
dataframe. Thus, we can now analyze the survey respondents according to their length of employment. Firstly, we'll identify and classify employees who resigned because they were dissatisfied. The reason for which a worker's employment ended at both institutes are indicated by the column separationtype
in both datasets. Other factors that may have contributed to the job cessation are indicated in other columns of the datasets.
According to a dictionary meaning,
Dissatisfaction: The feeling of being displeased and discontent.
Based on that meaning, we'll select some columns from the columns in both dataframes to categorize employees as "dissatisfied". The selected columns are listed below
If the employee indicated any of the above caused them to resign, we'll mark them as dissatisfied in a new column as following:
Let's start by viewing the values in the two columns of Contributing Factors
in tafe_resignations
dataframe.
# Checks the unique values
tafe_resignations['Contributing Factors. Dissatisfaction'].value_counts()
# Checks the unique values
tafe_resignations['Contributing Factors. Job Dissatisfaction'].value_counts()
There are 55 respondents who resigned due to dissatisfaction in the Contributing Factors. Dissatisfaction
column. In the other column (Contributing Factors. Job Dissatisfaction
), 62 respondents resigned due to dissatisfaction.
# Updates values in columns to either True, False, or NaN
def update_vals(value):
if pd.isnull(value):
return np.nan
elif value == '-':
return False
else:
return True
# Creates a new column for dissatisfaction state
tafe_resignations['dissatisfied'] = tafe_resignations[['Contributing Factors. Dissatisfaction', 'Contributing Factors. Job Dissatisfaction']].applymap(update_vals).any(1, skipna=False)
tafe_resignations_up = tafe_resignations.copy()
# Checks the unique values of the new column
tafe_resignations_up['dissatisfied'].value_counts(dropna=False)
# Creates a new column for dissatisfaction state
dete_resignations['dissatisfied'] = dete_resignations[['job_dissatisfaction', 'dissatisfaction_with_the_department', 'physical_work_environment', 'lack_of_recognition', 'lack_of_job_security', 'work_location', 'employment_conditions', 'work_life_balance', 'workload']].any(1, skipna=False)
dete_resignations_up = dete_resignations.copy()
# Checks the unique values of the new column
dete_resignations_up['dissatisfied'].value_counts(dropna=False)
Hooray! We've successfully created a new column dissatisfied
in both dataframes classifying each respondent whose reason for resignation was based on job dissatisfaction (True
) or other reasons (False
). This action is necessary because we hope to combine the two dataframe and be able to make deductions based on this reason.
Up until this moment, we have worked to clean up and add valuable data. We've carried out these changes separately on each dataframe. Now, we're finally ready to combine the two datasets and take a big step towards answering our questions.
We're going to start by adding a column to each dataframe. This new column will allow us to easily distinguish between the two datasets after we've combined them.
# Adds a new column to the datasets
dete_resignations_up['institute'] = 'DETE'
tafe_resignations_up['institute'] = 'TAFE'
# Combines the datasets
combined = pd.concat([dete_resignations_up, tafe_resignations_up], ignore_index=True)
# Checks the number of non null values in each column
combined.notnull().sum().sort_values()
We've successfully combined the two datasets. Next, we checked for the number of non null values in order to know the other unnecessary columns to drop from the combined
dataframe.
# Columns prior to dropping
combined.columns
# Removes columns with less than 500 non null values
combined_updated = combined.dropna(thresh=500, axis=1).copy()
# Columns after dropping
combined_updated.columns
We dropped all columns with less than 500 non null values above. This is necessary for the effective management and analysis of the data. Essentially, columns with less than 500 non null values can not influence the result of our analysis.
Before we proceed to proper analysis, we want to clean up the institute_service
column. Let's see the unique values in that particular column below:
combined_updated['institute_service'].value_counts(dropna=False)
The column contains some very different forms of value. Tricky business! In order to analyze the data, we'll have to convert these values into categories below:
This analysis is based on this article, which makes the argument that understanding employee's needs according to career stage instead of age is more effective.
# Extracts years of service from column and converts to float
combined_updated['institute_service_up'] = combined_updated['institute_service'].astype('str').str.extract(r"(\d+)")
combined_updated['institute_service_up'] = combined_updated['institute_service_up'].astype('float')
# Confirms successful extraction
combined_updated['institute_service_up'].value_counts()
# Sorts years into different categories
def categorize(value):
if value >= 11:
return "Veteran"
elif 3 <= value < 7:
return "Experienced"
elif 7 <= value < 11:
return "Established"
elif pd.isnull(value):
return np.nan
else:
return "New"
# Applies function to the column
combined_updated['service_cat'] = combined_updated['institute_service_up'].apply(categorize)
# Confirms the changes
combined_updated['service_cat'].value_counts()
We categorized the employees into four groups based on the number of years spent working. A new column service_cat
was created to contain the category each employee falls to. Now it is easy to analyze our data and answer some salient questions.
In order to get started with our data analysis, we'll perform a small piece of analysis. We'll first fill in missing values in the dissatisfied
column and then aggregate the data.
# Checks the unique values in the column
combined_updated['dissatisfied'].value_counts(dropna=False)
# Replaces missing values in the column
combined_updated['dissatisfied'] = combined_updated['dissatisfied'].fillna(False)
So far, we have found out the total number of workers who resigned due to some kinds of job dissatisfaction to be 240. To continue further with the analysis, we needed to fill the 8 missing values in the column with the most frequent category therein.
The False
group occurs most in the dissatisfied
column, we therefore replaced the missing values with 'False'.
All the values in dissatisfied
column now consists of Boolean values, meaning they're either True
or False
. Thus, we'll use a pivot table to aggregate the column and calculate the percentage of people in each group. This is possible because df.pivot_table()
method treats Boolean values as integers, so a True
value is considered to be 1
and a False
value is considered to be 0
.
# Calculate and visualize the percentage of dissatisfied employees
table_s_cat = combined_updated.pivot_table(index='service_cat', values='dissatisfied')
print(table_s_cat)
# Data to plot bar chart
plt.figure(figsize=(10,10))
sns.set_style('darkgrid')
ax = sns.barplot(x=table_s_cat.index, y=table_s_cat['dissatisfied'], palette=sns.color_palette('GnBu_d'))
ax.set_yticks([])
ax.set_title('Percentage of Dissatisfied Employee Based on Service Category')
ax.set_xlabel('Service Category')
ax.set_ylabel('Percentage')
ax.tick_params(bottom=True, top=False, left=False, right=False, labelbottom=True)
sns.despine(left=True)
Let's also visualize this data using a pie chart:
# Data to plot pie chart
labels = 'Established', 'Experienced', 'New', 'Veteran'
colours = ['lightskyblue', 'yellowgreen', 'lightcoral', 'gold']
explode = (0.1, 0, 0, 0.1) # "Explodes" the first and last slice
plt.pie(table_s_cat, labels=labels, colors=colours, autopct='%1.1f%%', shadow=True, explode=explode, startangle=90)
plt.title('Percentage of Dissatisfied Employee Based on Service Category', bbox={'facecolor':'0.8', 'pad':5})
plt.axis('equal') # Ensures pie is drawn as a circle
plt.show()
Above, we aggregated the data in the dissatisfied
column and used the information to calculate the percentage of the people in each group. From this small analysis, we can presently conclude that more employees who have spent 7 or more years at a job (Established and Veteran categories) resigned due to some kinds of job dissatisfaction (with more than 60% of the total number of dissatisfied employees). Compared to the two aforementioned categories, employees with less than 7 years of service resigned less due to job dissatisfaction.
Next, we'll handle the missing values in the other categories. We want to know how many people resigned due to some kind of dissatisfaction and the percentage according to gender. First, let's take a closer look at each column and clean up the age
column, which is still in an untreatable form.
# Values in each column
gender_counts = combined_updated['gender'].value_counts(dropna=False)
print(gender_counts, '\n')
service_counts = combined_updated['service_cat'].value_counts(dropna=False)
print(service_counts, '\n')
age_counts = combined_updated['age'].value_counts(dropna=False)
print(age_counts)
# Extracts the first age in each age group
combined_updated['age'] = combined_updated['age'].str.extract(r"(\d+)").astype(float)
combined_updated['age'].value_counts(dropna=False)
The age
column now looks good and can be analyzed. We can also observe that there are 55 missing values in the column. Before we decide on what to do with these missing values, let's first look at the missing values across the columns we are interested in.
# Creates heatmap to locate missing values
map_index = combined_updated.set_index('dissatisfied')
sns.heatmap(map_index.isnull(), cbar=False)
Above, we created a heatmap to check the location of the missing values in our dataframe. We can observe that the essential columns for our next analysis (age
, gender
and service_cat
) all have roughly the same number of missing values and almost on the same rows of data. We'll continue by filling the missing age
data with the average value of all the ages. It is however unnecessary to try to fill the other missing values and thus, we'll leave the dataset as is.
# Calculate the mean age
mean = combined_updated['age'].mean()
# Fill missing age with the mean value
combined_updated['age'] = combined_updated['age'].fillna(mean)
# Display the unique values in the age column
combined_updated['age'].value_counts(dropna=False)
Next, we'll group the ages into four different categories. These categories include workers in their:
# A function to group age
def categorize_age(age):
if age >= 50:
return '50s and above'
elif 40 <= age < 50:
return '40s'
elif 30 <= age < 40:
return '30s'
elif 20 <= age < 30:
return '20s and below'
# Apply function to categorize age into groups in a new column
combined_updated['age_cat'] = combined_updated['age'].apply(categorize_age)
combined_updated['age_cat'].value_counts()
As we can now see, we have 139 workers who are in their 20s or below, 189 workers in their 30s, 174 workers in their 40s and 149 workers in their 50s or above. This categorization looks encompassing enough.
Earlier, we aggregated dissatisfied workers based on service category. Right now, we want to do the same, albeit based on age category.
# Calculate and visualize the percentage of dissatisfied employees
table_a_cat = combined_updated.pivot_table(index='age_cat', values='dissatisfied')
print(table_a_cat)
# Data to plot
plt.figure(figsize=(10,10))
sns.set_style('darkgrid')
ax = sns.barplot(x=table_a_cat.index, y=table_a_cat['dissatisfied'], palette=sns.color_palette('GnBu_d'))
ax.set_yticks([])
ax.set_title('Percentage of Dissatisfied Employee Based on Age Category')
ax.set_xlabel('Age Category')
ax.set_ylabel('Percentage')
ax.tick_params(bottom=True, top=False, left=False, right=False, labelbottom=True)
sns.despine(left=True)
Let's also visualize this data using a pie chart below:
labels = '20s and below', '30s', '40s', '50s and above'
colours = ['lightcoral', 'lightskyblue', 'gold', 'yellowgreen']
explode = (0.05, 0.05, 0.05, 0.1)
plt.pie(table_a_cat, labels=labels, colors=colours, autopct='%1.1f%%', shadow=True, explode=explode, startangle=90)
plt.title('Percentage of Dissatisfied Employee Based on Age Category', bbox={'facecolor':'0.8', 'pad':5})
plt.axis('equal') # Ensures pie is drawn as a circle
plt.show()
In this project we set out to answer some important questions regarding worker's resignation. From the analysis we performed, we found the total number of workers who resigned due to some kinds of job dissatisfaction to be 240. We also found that compared to employees with less than 7 years of service, more employees who have spent 7 or more years at a job (Established and Veteran categories) resigned due to some kinds of job dissatisfaction (with more than 60% of the total number of dissatisfied employees).
Further into the analysis, we noticed that more workers resigned as they increase in age. We observed a progressive increase in the percentage of workers that resigned due to job dissatisfaction as they move from their 30s to 40s etc. We found the highest percentage of dissatisfied workers among those who are 50 years of age and above. However, we found that more workers in their 20s and below resigned due to job dissatisfaction compared to those in their 30s.
One of the reasons this might be so is that in many cases, most in their 20s are recent graduates, new entrants in the workforce, single (or at least unmarried). Many of these young individuals, fresh out of college, lockstep with peers, carry much expectation and expect a workplace to be ideal and a perfect fit. But then after these expectations are not met, they may become grumpy and dissatisfied with their jobs. However, when individuals attain their 30s they probably are now at peace, develop more emotional, moral and intellectual alignment with the world. These individuals have now attained another level of self-awareness and may put up more with some job dissatisfactions. Then, this character probably tends to simmer down as individuals become much older as seen by this analysis.
Here, we want to present the numbers of people in each service category and the gender differences.
# Dataframe of dissatisfied employees in the Veteran category
dissatisfied_veteran = combined_updated.loc[(combined_updated['dissatisfied'] == True) & (combined_updated['service_cat'] == 'Veteran')]
print('Total Dissatisfied Veteran Workers = ' + str(len(dissatisfied_veteran)))
dissatisfied_veteran['gender'].value_counts(dropna=False)
# Dataframe of dissatisfied employees in the Established category
dissatisfied_established = combined_updated.loc[(combined_updated['dissatisfied'] == True) & (combined_updated['service_cat'] == 'Established')]
print('Total Dissatisfied Established Workers = ' + str(len(dissatisfied_established)))
dissatisfied_established['gender'].value_counts(dropna=False)
# Dataframe of dissatisfied employees in the Experienced category
dissatisfied_experienced = combined_updated.loc[(combined_updated['dissatisfied'] == True) & (combined_updated['service_cat'] == 'Experienced')]
print('Total Dissatisfied Experienced Workers = ' + str(len(dissatisfied_experienced)))
dissatisfied_experienced['gender'].value_counts(dropna=False)
# Dataframe of dissatisfied employees in the New category
dissatisfied_new = combined_updated.loc[(combined_updated['dissatisfied'] == True) & (combined_updated['service_cat'] == 'New')]
print('Total Dissatisfied New Workers = ' + str(len(dissatisfied_new)))
dissatisfied_new['gender'].value_counts(dropna=False)