import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from datetime import datetime, date
plt.style.use('ggplot')
# Loading the New Customer Data from the excel file
new_cust = pd.read_excel('Raw_data.xlsx' , sheet_name='NewCustomerList')
# Checking first 5 records from New Customer Data
new_cust.head(5)
first_name | last_name | gender | past_3_years_bike_related_purchases | DOB | job_title | job_industry_category | wealth_segment | deceased_indicator | owns_car | ... | state | country | property_valuation | Unnamed: 16 | Unnamed: 17 | Unnamed: 18 | Unnamed: 19 | Unnamed: 20 | Rank | Value | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Chickie | Brister | Male | 86 | 1957-07-12 | General Manager | Manufacturing | Mass Customer | N | Yes | ... | QLD | Australia | 6 | 0.81 | 1.0125 | 1.265625 | 1.075781 | 1 | 1 | 1.718750 |
1 | Morly | Genery | Male | 69 | 1970-03-22 | Structural Engineer | Property | Mass Customer | N | No | ... | NSW | Australia | 11 | 0.75 | 0.7500 | 0.937500 | 0.796875 | 1 | 1 | 1.718750 |
2 | Ardelis | Forrester | Female | 10 | 1974-08-28 | Senior Cost Accountant | Financial Services | Affluent Customer | N | No | ... | VIC | Australia | 5 | 0.71 | 0.7100 | 0.710000 | 0.710000 | 1 | 1 | 1.718750 |
3 | Lucine | Stutt | Female | 64 | 1979-01-28 | Account Representative III | Manufacturing | Affluent Customer | N | Yes | ... | QLD | Australia | 1 | 0.50 | 0.6250 | 0.625000 | 0.625000 | 4 | 4 | 1.703125 |
4 | Melinda | Hadlee | Female | 34 | 1965-09-21 | Financial Analyst | Financial Services | Affluent Customer | N | No | ... | NSW | Australia | 9 | 0.99 | 0.9900 | 1.237500 | 1.237500 | 4 | 4 | 1.703125 |
5 rows × 23 columns
new_cust.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Data columns (total 23 columns): first_name 1000 non-null object last_name 971 non-null object gender 1000 non-null object past_3_years_bike_related_purchases 1000 non-null int64 DOB 983 non-null datetime64[ns] job_title 894 non-null object job_industry_category 835 non-null object wealth_segment 1000 non-null object deceased_indicator 1000 non-null object owns_car 1000 non-null object tenure 1000 non-null int64 address 1000 non-null object postcode 1000 non-null int64 state 1000 non-null object country 1000 non-null object property_valuation 1000 non-null int64 Unnamed: 16 1000 non-null float64 Unnamed: 17 1000 non-null float64 Unnamed: 18 1000 non-null float64 Unnamed: 19 1000 non-null float64 Unnamed: 20 1000 non-null int64 Rank 1000 non-null int64 Value 1000 non-null float64 dtypes: datetime64[ns](1), float64(5), int64(6), object(11) memory usage: 179.8+ KB
The data-types of the feature columns are fine. However 'Unnamed: 16','Unnamed: 17','Unnamed: 18','Unnamed: 19','Unnamed: 20' are irrelevent column. Hence it should be dropped.
print("Total records (rows) in the dataset : {}".format(new_cust.shape[0]))
print("Total columns (features) in the dataset : {}".format(new_cust.shape[1]))
Total records (rows) in the dataset : 1000 Total columns (features) in the dataset : 23
# select numeric columns
df_numeric = new_cust.select_dtypes(include=[np.number])
numeric_cols = df_numeric.columns.values
print("The numeric columns are :")
print(numeric_cols)
# select non-numeric columns
df_non_numeric = new_cust.select_dtypes(exclude=[np.number])
non_numeric_cols = df_non_numeric.columns.values
print("The non-numeric columns are :")
print(non_numeric_cols)
The numeric columns are : ['past_3_years_bike_related_purchases' 'tenure' 'postcode' 'property_valuation' 'Unnamed: 16' 'Unnamed: 17' 'Unnamed: 18' 'Unnamed: 19' 'Unnamed: 20' 'Rank' 'Value'] The non-numeric columns are : ['first_name' 'last_name' 'gender' 'DOB' 'job_title' 'job_industry_category' 'wealth_segment' 'deceased_indicator' 'owns_car' 'address' 'state' 'country']
'Unnamed: 16','Unnamed: 17','Unnamed: 18','Unnamed: 19','Unnamed: 20' are irrelevent column. Hence it should be dropped.
new_cust.drop(labels=['Unnamed: 16','Unnamed: 17','Unnamed: 18','Unnamed: 19','Unnamed: 20'], axis=1 , inplace=True)
Checking for the presence of any missing values in the dataset. If missing values are present for a particular feature then depending upon the situation the feature may be either dropped (cases when a major amount of data is missing) or an appropiate value will be imputed in the feature column with missing values.
# Total number of missing values
new_cust.isnull().sum()
first_name 0 last_name 29 gender 0 past_3_years_bike_related_purchases 0 DOB 17 job_title 106 job_industry_category 165 wealth_segment 0 deceased_indicator 0 owns_car 0 tenure 0 address 0 postcode 0 state 0 country 0 property_valuation 0 Rank 0 Value 0 dtype: int64
# Percentage of missing values
new_cust.isnull().mean()*100
first_name 0.0 last_name 2.9 gender 0.0 past_3_years_bike_related_purchases 0.0 DOB 1.7 job_title 10.6 job_industry_category 16.5 wealth_segment 0.0 deceased_indicator 0.0 owns_car 0.0 tenure 0.0 address 0.0 postcode 0.0 state 0.0 country 0.0 property_valuation 0.0 Rank 0.0 Value 0.0 dtype: float64
Since All customers have a First name, all the customers are identifiable. Hence it is okay for to not have a last name. Filling null last names with "None"
new_cust[new_cust['last_name'].isnull()][['first_name']].isnull().sum()
first_name 0 dtype: int64
new_cust[new_cust['last_name'].isnull()]
first_name | last_name | gender | past_3_years_bike_related_purchases | DOB | job_title | job_industry_category | wealth_segment | deceased_indicator | owns_car | tenure | address | postcode | state | country | property_valuation | Rank | Value | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
12 | Olag | NaN | Male | 60 | 1990-05-13 | Human Resources Manager | Telecommunications | Mass Customer | N | No | 9 | 0484 North Avenue | 2032 | NSW | Australia | 11 | 13 | 1.609375 |
58 | Whittaker | NaN | Male | 64 | 1966-07-29 | Media Manager III | NaN | Mass Customer | N | Yes | 8 | 683 Florence Way | 3156 | VIC | Australia | 5 | 57 | 1.375000 |
87 | Kahaleel | NaN | Male | 5 | 1942-11-01 | GIS Technical Architect | NaN | High Net Worth | N | No | 13 | 12 Arapahoe Park | 2035 | NSW | Australia | 12 | 88 | 1.314844 |
155 | Bill | NaN | Female | 74 | 1963-04-24 | Human Resources Assistant II | Property | Mass Customer | N | Yes | 19 | 6704 Pine View Lane | 2170 | NSW | Australia | 9 | 155 | 1.200000 |
202 | Glyn | NaN | Male | 47 | 1945-02-13 | General Manager | Manufacturing | Affluent Customer | N | Yes | 21 | 67 Bluejay Plaza | 2300 | NSW | Australia | 9 | 202 | 1.140625 |
326 | Haleigh | NaN | Female | 17 | 1952-05-19 | Senior Sales Associate | Financial Services | Mass Customer | N | Yes | 18 | 49 Jana Point | 4503 | QLD | Australia | 4 | 326 | 1.009375 |
330 | Alon | NaN | Male | 17 | 1999-06-23 | Accountant IV | NaN | Affluent Customer | N | No | 9 | 770 Crest Line Parkway | 4218 | QLD | Australia | 3 | 329 | 1.000000 |
357 | Otis | NaN | Male | 59 | 1971-01-11 | Electrical Engineer | Manufacturing | Affluent Customer | N | No | 12 | 04 Oakridge Plaza | 2075 | NSW | Australia | 11 | 358 | 0.980000 |
419 | Sherill | NaN | Female | 33 | 1991-12-18 | Information Systems Manager | Financial Services | Mass Customer | N | No | 3 | 53 Moulton Avenue | 2880 | NSW | Australia | 1 | 420 | 0.913750 |
442 | Theresina | NaN | Female | 30 | 1987-03-01 | General Manager | Argiculture | Mass Customer | N | Yes | 14 | 253 Katie Junction | 2650 | NSW | Australia | 2 | 441 | 0.901000 |
455 | Laurena | NaN | Female | 21 | 1961-07-31 | VP Sales | NaN | High Net Worth | N | No | 10 | 7 Messerschmidt Crossing | 3810 | VIC | Australia | 6 | 455 | 0.892500 |
474 | Laurie | NaN | Male | 31 | 1979-07-28 | Assistant Media Planner | Entertainment | Mass Customer | N | Yes | 15 | 94 Barby Lane | 2210 | NSW | Australia | 10 | 475 | 0.881875 |
477 | Blondie | NaN | Female | 43 | 1995-10-03 | Actuary | Financial Services | High Net Worth | N | No | 11 | 780 Norway Maple Hill | 2565 | NSW | Australia | 8 | 478 | 0.880000 |
484 | Georgi | NaN | Male | 29 | 1970-01-14 | Assistant Manager | Manufacturing | High Net Worth | N | No | 11 | 59 Garrison Terrace | 3215 | VIC | Australia | 4 | 485 | 0.875500 |
487 | Lucien | NaN | Male | 83 | 1966-09-14 | NaN | Financial Services | High Net Worth | N | Yes | 19 | 777 Fairfield Court | 4305 | QLD | Australia | 3 | 486 | 0.875000 |
494 | Park | NaN | Male | 39 | 1977-11-08 | Nurse Practicioner | IT | Affluent Customer | N | No | 14 | 07 Boyd Drive | 4350 | QLD | Australia | 7 | 495 | 0.863281 |
502 | Cariotta | NaN | Female | 10 | 1974-08-19 | Assistant Media Planner | Entertainment | Affluent Customer | N | Yes | 17 | 2336 Continental Point | 2527 | NSW | Australia | 7 | 502 | 0.858500 |
531 | Amabel | NaN | Female | 71 | 1981-09-14 | Chief Design Engineer | Financial Services | Mass Customer | N | Yes | 9 | 3128 Mallory Pass | 2144 | NSW | Australia | 6 | 530 | 0.828750 |
586 | Raynard | NaN | Male | 32 | 1996-04-13 | Statistician III | Health | Affluent Customer | N | No | 14 | 20187 Loomis Court | 4132 | QLD | Australia | 6 | 587 | 0.786250 |
616 | Mariette | NaN | Female | 47 | 1956-07-05 | Programmer II | Property | Affluent Customer | N | Yes | 17 | 770 Farmco Point | 2049 | NSW | Australia | 11 | 617 | 0.754375 |
755 | Darb | NaN | Male | 80 | 1969-06-04 | Food Chemist | Health | Affluent Customer | N | No | 10 | 780 Bonner Pass | 4034 | QLD | Australia | 5 | 755 | 0.640000 |
767 | Simonette | NaN | Female | 4 | 1990-04-06 | VP Product Management | Manufacturing | Affluent Customer | N | Yes | 6 | 66 Hoffman Court | 2232 | NSW | Australia | 8 | 760 | 0.637500 |
779 | Ashleigh | NaN | Female | 46 | 1996-04-05 | Budget/Accounting Analyst III | NaN | Mass Customer | N | Yes | 6 | 922 Utah Avenue | 3204 | VIC | Australia | 12 | 780 | 0.624219 |
786 | Fey | NaN | Female | 48 | 1957-09-04 | Research Nurse | Health | High Net Worth | N | Yes | 11 | 77 Paget Park | 3147 | VIC | Australia | 12 | 786 | 0.616250 |
813 | Dmitri | NaN | Male | 72 | 1991-02-06 | NaN | Financial Services | High Net Worth | N | Yes | 15 | 4 Mallory Pass | 3690 | VIC | Australia | 4 | 810 | 0.587500 |
839 | Ginger | NaN | Male | 94 | 1939-02-19 | Human Resources Manager | NaN | Mass Customer | N | No | 11 | 160 Fremont Point | 2259 | NSW | Australia | 8 | 840 | 0.571094 |
849 | Leeland | NaN | Male | 66 | 1957-01-24 | VP Quality Control | Telecommunications | High Net Worth | N | No | 12 | 9 Stephen Center | 4122 | QLD | Australia | 4 | 845 | 0.563125 |
888 | Antoinette | NaN | Female | 72 | 1980-07-28 | Structural Analysis Engineer | Financial Services | Affluent Customer | N | No | 5 | 9 Derek Alley | 3058 | VIC | Australia | 9 | 888 | 0.525000 |
952 | Candy | NaN | Female | 23 | 1977-12-08 | NaN | Financial Services | Mass Customer | N | No | 6 | 59252 Maryland Drive | 3500 | VIC | Australia | 3 | 951 | 0.450500 |
new_cust['last_name'].fillna('None',axis=0, inplace=True)
new_cust['last_name'].isnull().sum()
0
Currently there are no missing values for Last Name column.
new_cust[new_cust['DOB'].isnull()]
first_name | last_name | gender | past_3_years_bike_related_purchases | DOB | job_title | job_industry_category | wealth_segment | deceased_indicator | owns_car | tenure | address | postcode | state | country | property_valuation | Rank | Value | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
59 | Normy | Goodinge | U | 5 | NaT | Associate Professor | IT | Mass Customer | N | No | 4 | 7232 Fulton Parkway | 3810 | VIC | Australia | 5 | 57 | 1.375000 |
226 | Hatti | Carletti | U | 35 | NaT | Legal Assistant | IT | Affluent Customer | N | Yes | 11 | 6 Iowa Center | 2519 | NSW | Australia | 9 | 226 | 1.112500 |
324 | Rozamond | Turtle | U | 69 | NaT | Legal Assistant | IT | Mass Customer | N | Yes | 3 | 57025 New Castle Street | 3850 | VIC | Australia | 3 | 324 | 1.010000 |
358 | Tamas | Swatman | U | 65 | NaT | Assistant Media Planner | Entertainment | Affluent Customer | N | No | 5 | 78 Clarendon Drive | 4551 | QLD | Australia | 8 | 358 | 0.980000 |
360 | Tracy | Andrejevic | U | 71 | NaT | Programmer II | IT | Mass Customer | N | Yes | 11 | 5675 Burning Wood Trail | 3030 | VIC | Australia | 7 | 361 | 0.977500 |
374 | Agneta | McAmish | U | 66 | NaT | Structural Analysis Engineer | IT | Mass Customer | N | No | 15 | 5773 Acker Way | 4207 | QLD | Australia | 6 | 375 | 0.960000 |
434 | Gregg | Aimeric | U | 52 | NaT | Internal Auditor | IT | Mass Customer | N | No | 7 | 72423 Surrey Street | 3753 | VIC | Australia | 5 | 433 | 0.906250 |
439 | Johna | Bunker | U | 93 | NaT | Tax Accountant | IT | Mass Customer | N | Yes | 14 | 3686 Waubesa Way | 3065 | VIC | Australia | 6 | 436 | 0.903125 |
574 | Harlene | Nono | U | 69 | NaT | Human Resources Manager | IT | Mass Customer | N | No | 12 | 0307 Namekagon Crossing | 2170 | NSW | Australia | 7 | 575 | 0.796875 |
598 | Gerianne | Kaysor | U | 15 | NaT | Project Manager | IT | Affluent Customer | N | No | 5 | 882 Toban Lane | 2121 | NSW | Australia | 11 | 599 | 0.775000 |
664 | Chicky | Sinclar | U | 43 | NaT | Operator | IT | High Net Worth | N | Yes | 0 | 5 Red Cloud Place | 3222 | VIC | Australia | 4 | 662 | 0.711875 |
751 | Adriana | Saundercock | U | 20 | NaT | Nurse | IT | High Net Worth | N | Yes | 14 | 82 Gina Junction | 3806 | VIC | Australia | 7 | 751 | 0.648125 |
775 | Dmitri | Viant | U | 62 | NaT | Paralegal | Financial Services | Affluent Customer | N | No | 5 | 95960 Warner Parkway | 3842 | VIC | Australia | 1 | 774 | 0.626875 |
835 | Porty | Hansed | U | 88 | NaT | General Manager | IT | Mass Customer | N | No | 13 | 768 Southridge Drive | 2112 | NSW | Australia | 11 | 832 | 0.575000 |
883 | Shara | Bramhill | U | 24 | NaT | NaN | IT | Affluent Customer | N | No | 2 | 01 Bunker Hill Drive | 2230 | NSW | Australia | 10 | 883 | 0.531250 |
904 | Roth | Crum | U | 0 | NaT | Legal Assistant | IT | Mass Customer | N | No | 2 | 276 Anthes Court | 2450 | NSW | Australia | 6 | 904 | 0.500000 |
984 | Pauline | Dallosso | U | 82 | NaT | Desktop Support Technician | IT | Affluent Customer | N | Yes | 0 | 9594 Badeau Street | 2050 | NSW | Australia | 10 | 985 | 0.408000 |
round(new_cust['DOB'].isnull().mean()*100)
2.0
Less than 5 % of data has null date of birth. we can remove the records where date of birth is null
# Fetching the index of the records / rows where the DOB is null.
dob_index_drop = new_cust[new_cust['DOB'].isnull()].index
dob_index_drop
Int64Index([ 59, 226, 324, 358, 360, 374, 434, 439, 574, 598, 664, 751, 775, 835, 883, 904, 984], dtype='int64')
new_cust.drop(index=dob_index_drop, inplace=True, axis=0)
new_cust['DOB'].isnull().sum()
0
Currently there are no missing values for DOB.
# Function to find the age of customers as of today.
def age(born):
today = date.today()
return today.year - born.year - ((today.month, today.day) < (born.month, born.day))
new_cust['Age'] = new_cust['DOB'].apply(age)
Descriptive Statistics of Age column
new_cust['Age'].describe()
count 983.000000 mean 49.581892 std 17.052487 min 19.000000 25% 38.000000 50% 49.000000 75% 63.000000 max 82.000000 Name: Age, dtype: float64
# Viz to find out the Age Distribution
plt.figure(figsize=(15,8))
sns.distplot(new_cust['Age'], kde=False, bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x245691adf60>
Looking at the age field there is no descripency in the data
new_cust['Age Group'] = new_cust['Age'].apply(lambda x : (math.floor(x/10)+1)*10)
# Viz to find out the Age Group Distribution
plt.figure(figsize=(10,8))
sns.distplot(new_cust['Age Group'], kde=False, bins=50)
<matplotlib.axes._subplots.AxesSubplot at 0x24568dddf98>
The highest number of New Customers are from the Age Group 50-59.
new_cust[new_cust['job_title'].isnull()]
first_name | last_name | gender | past_3_years_bike_related_purchases | DOB | job_title | job_industry_category | wealth_segment | deceased_indicator | owns_car | tenure | address | postcode | state | country | property_valuation | Rank | Value | Age | Age Group | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
15 | Dukie | Swire | Male | 88 | 1954-03-31 | NaN | Manufacturing | Affluent Customer | N | Yes | 5 | 64 Granby Parkway | 2500 | NSW | Australia | 8 | 16 | 1.562500 | 67 | 70 |
25 | Rourke | Gillbard | Male | 11 | 1945-08-03 | NaN | Property | Mass Customer | N | No | 17 | 75 Cordelia Trail | 4817 | QLD | Australia | 4 | 26 | 1.468750 | 75 | 80 |
29 | Rhona | De Freyne | Female | 45 | 1960-11-22 | NaN | Health | High Net Worth | N | No | 8 | 11184 East Drive | 3056 | VIC | Australia | 10 | 30 | 1.460938 | 60 | 70 |
30 | Sharron | Claibourn | Female | 62 | 1980-01-26 | NaN | Financial Services | High Net Worth | N | Yes | 17 | 555 Hermina Avenue | 2280 | NSW | Australia | 8 | 30 | 1.460938 | 41 | 50 |
37 | Mitchell | MacCague | Male | 58 | 1979-04-11 | NaN | Manufacturing | Mass Customer | N | No | 15 | 240 Acker Avenue | 3190 | VIC | Australia | 8 | 38 | 1.437500 | 42 | 50 |
38 | Garik | Whitwell | Male | 44 | 1955-06-13 | NaN | Property | Mass Customer | N | Yes | 13 | 04 Dexter Way | 3280 | VIC | Australia | 2 | 38 | 1.437500 | 65 | 70 |
39 | Antonin | Britt | Male | 64 | 1993-08-28 | NaN | Manufacturing | Affluent Customer | N | Yes | 8 | 011 Northland Trail | 2160 | NSW | Australia | 9 | 40 | 1.434375 | 27 | 30 |
40 | Vinny | Incogna | Female | 73 | 1953-02-13 | NaN | Health | High Net Worth | N | No | 10 | 8 Grayhawk Circle | 2756 | NSW | Australia | 8 | 40 | 1.434375 | 68 | 70 |
42 | Neile | Argent | Female | 79 | 1946-10-25 | NaN | Retail | Mass Customer | N | No | 8 | 2548 Arrowood Pass | 2024 | NSW | Australia | 10 | 42 | 1.421875 | 74 | 80 |
44 | Brooke | Arling | Male | 76 | 1961-12-05 | NaN | NaN | High Net Worth | N | No | 6 | 6 Melby Center | 3027 | VIC | Australia | 5 | 44 | 1.421094 | 59 | 60 |
50 | Heinrick | Shilstone | Male | 60 | 1978-02-11 | NaN | Manufacturing | Affluent Customer | N | No | 10 | 998 Gale Park | 3174 | VIC | Australia | 8 | 50 | 1.406250 | 43 | 50 |
53 | Odessa | Mc Andrew | Female | 97 | 1981-12-01 | NaN | Property | Mass Customer | N | No | 8 | 31756 Meadow Valley Lane | 2232 | NSW | Australia | 10 | 54 | 1.381250 | 39 | 40 |
74 | Mabelle | Wellbelove | Female | 76 | 1958-04-21 | NaN | Financial Services | Affluent Customer | N | Yes | 19 | 800 Emmet Park | 2219 | NSW | Australia | 9 | 72 | 1.350000 | 63 | 70 |
82 | Esther | Rooson | Female | 14 | 1981-02-22 | NaN | Financial Services | Mass Customer | N | No | 5 | 5186 Main Trail | 2046 | NSW | Australia | 9 | 78 | 1.337500 | 40 | 50 |
92 | Andromache | Bonafacino | Female | 84 | 1977-09-01 | NaN | Retail | Mass Customer | N | No | 11 | 74 Carpenter Street | 2015 | NSW | Australia | 9 | 89 | 1.312500 | 43 | 50 |
94 | Nobe | McAughtry | Male | 25 | 1978-12-14 | NaN | NaN | Mass Customer | N | No | 12 | 1 Orin Hill | 4510 | QLD | Australia | 5 | 89 | 1.312500 | 42 | 50 |
95 | Jehu | Prestedge | Male | 91 | 1999-10-20 | NaN | Manufacturing | High Net Worth | N | Yes | 8 | 88 Annamark Avenue | 2138 | NSW | Australia | 12 | 96 | 1.300000 | 21 | 30 |
109 | Michal | Bryan | Female | 1 | 1969-11-09 | NaN | Manufacturing | Mass Customer | N | Yes | 16 | 4275 Bluestem Pass | 4000 | QLD | Australia | 8 | 104 | 1.287500 | 51 | 60 |
115 | Frederik | Milan | Male | 45 | 1997-11-13 | NaN | Health | Mass Customer | N | No | 5 | 56 Riverside Street | 2546 | NSW | Australia | 5 | 114 | 1.275000 | 23 | 30 |
125 | Elsworth | Abbitt | Male | 71 | 1956-02-08 | NaN | Health | Mass Customer | N | Yes | 6 | 9722 Northport Way | 3500 | VIC | Australia | 3 | 125 | 1.261719 | 65 | 70 |
132 | Sharline | Abyss | Female | 11 | 1960-03-18 | NaN | NaN | Mass Customer | N | Yes | 15 | 367 Bay Point | 4011 | QLD | Australia | 4 | 133 | 1.237500 | 61 | 70 |
133 | Nowell | Preddy | Male | 29 | 1985-07-23 | NaN | Manufacturing | Mass Customer | N | No | 9 | 932 Glendale Avenue | 2173 | NSW | Australia | 9 | 133 | 1.237500 | 35 | 40 |
149 | Bernardine | Delmonti | Female | 39 | 1971-03-31 | NaN | Property | Mass Customer | N | No | 17 | 0721 Meadow Ridge Pass | 2540 | NSW | Australia | 8 | 146 | 1.225000 | 50 | 60 |
200 | Alfonso | Massel | Male | 70 | 1940-12-05 | NaN | NaN | Mass Customer | N | Yes | 13 | 6065 Talisman Crossing | 3977 | VIC | Australia | 7 | 201 | 1.142187 | 80 | 90 |
201 | Engracia | Dobbs | Female | 84 | 1959-04-19 | NaN | Health | Mass Customer | N | No | 15 | 72 Eliot Place | 2250 | NSW | Australia | 8 | 202 | 1.140625 | 62 | 70 |
207 | Jeanne | Darte | Female | 70 | 1955-08-18 | NaN | NaN | Mass Customer | N | Yes | 11 | 3 Homewood Park | 2756 | NSW | Australia | 7 | 206 | 1.137500 | 65 | 70 |
211 | Abbie | Oldman | Male | 82 | 1983-11-26 | NaN | Health | High Net Worth | N | Yes | 5 | 4 North Drive | 2168 | NSW | Australia | 8 | 212 | 1.136875 | 37 | 40 |
219 | Hunfredo | Hayball | Male | 7 | 1994-04-15 | NaN | IT | Affluent Customer | N | No | 3 | 60461 Esch Avenue | 2227 | NSW | Australia | 8 | 219 | 1.125000 | 27 | 30 |
222 | Gretna | Thredder | Female | 62 | 1966-01-08 | NaN | NaN | Mass Customer | N | No | 18 | 1607 Westridge Drive | 2203 | NSW | Australia | 11 | 223 | 1.115625 | 55 | 60 |
224 | Wallace | Newart | Male | 91 | 1977-12-06 | NaN | IT | Mass Customer | N | No | 17 | 29007 Dapin Street | 4650 | QLD | Australia | 1 | 223 | 1.115625 | 43 | 50 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
668 | Cecil | Gant | Male | 52 | 1976-07-16 | NaN | NaN | High Net Worth | N | Yes | 9 | 22435 Barnett Court | 2145 | NSW | Australia | 8 | 668 | 0.705500 | 44 | 50 |
689 | Willard | Booton | Male | 69 | 1938-09-02 | NaN | Health | High Net Worth | N | Yes | 7 | 05 Ronald Regan Alley | 2121 | NSW | Australia | 9 | 688 | 0.697000 | 82 | 90 |
691 | Rockie | MacKibbon | Male | 42 | 1978-04-20 | NaN | NaN | Mass Customer | N | Yes | 13 | 8 Bunker Hill Court | 2298 | NSW | Australia | 8 | 691 | 0.690625 | 43 | 50 |
697 | Thaddus | Joder | Male | 31 | 1957-12-10 | NaN | Manufacturing | Mass Customer | N | No | 7 | 27185 Fisk Drive | 2290 | NSW | Australia | 8 | 698 | 0.690000 | 63 | 70 |
703 | Suzy | Bussens | Female | 44 | 1973-04-29 | NaN | Financial Services | Mass Customer | N | No | 13 | 25 Oneill Alley | 4102 | QLD | Australia | 9 | 700 | 0.687500 | 48 | 50 |
740 | Glory | Chilcott | Female | 49 | 1939-09-09 | NaN | Telecommunications | High Net Worth | N | No | 9 | 4286 Rowland Circle | 4165 | QLD | Australia | 5 | 741 | 0.658750 | 81 | 90 |
752 | Trudie | Phinnessy | Female | 45 | 1960-07-04 | NaN | Financial Services | Mass Customer | N | Yes | 15 | 077 Dennis Lane | 3030 | VIC | Australia | 9 | 751 | 0.648125 | 60 | 70 |
759 | Flore | Cashen | Female | 79 | 1978-06-21 | NaN | Health | High Net Worth | N | No | 17 | 4 Vera Pass | 2640 | NSW | Australia | 4 | 760 | 0.637500 | 42 | 50 |
764 | Hagen | MacCarter | Male | 93 | 1983-02-08 | NaN | Entertainment | Affluent Customer | N | Yes | 15 | 7 Ramsey Trail | 3172 | VIC | Australia | 9 | 760 | 0.637500 | 38 | 40 |
769 | Andrea | Pendle | Female | 86 | 1938-08-05 | NaN | NaN | High Net Worth | N | Yes | 13 | 31281 Meadow Valley Way | 4500 | QLD | Australia | 6 | 760 | 0.637500 | 82 | 90 |
778 | Yuma | Dennick | Male | 40 | 1972-11-10 | NaN | Manufacturing | Mass Customer | N | Yes | 6 | 89244 Macpherson Trail | 2528 | NSW | Australia | 8 | 778 | 0.625000 | 48 | 50 |
800 | Erminie | Rabidge | Female | 64 | 1982-03-09 | NaN | Manufacturing | High Net Worth | N | No | 17 | 1969 Melody Lane | 2170 | NSW | Australia | 8 | 801 | 0.597656 | 39 | 40 |
809 | Dorolice | Osmon | Female | 46 | 1961-01-15 | NaN | Financial Services | Affluent Customer | N | No | 15 | 602 Clove Center | 3046 | VIC | Australia | 6 | 810 | 0.587500 | 60 | 70 |
813 | Dmitri | None | Male | 72 | 1991-02-06 | NaN | Financial Services | High Net Worth | N | Yes | 15 | 4 Mallory Pass | 3690 | VIC | Australia | 4 | 810 | 0.587500 | 30 | 40 |
832 | Leonora | Swetenham | Female | 66 | 1967-10-05 | NaN | IT | Mass Customer | N | Yes | 10 | 660 Hallows Place | 2026 | NSW | Australia | 10 | 832 | 0.575000 | 53 | 60 |
873 | Babara | Sissel | Female | 50 | 1974-06-08 | NaN | IT | Mass Customer | N | Yes | 21 | 5 Ohio Road | 3169 | VIC | Australia | 10 | 871 | 0.541875 | 46 | 50 |
879 | Muffin | Bhar | Male | 44 | 1966-04-07 | NaN | NaN | Affluent Customer | N | No | 19 | 15 Weeping Birch Crossing | 2448 | NSW | Australia | 4 | 879 | 0.537500 | 55 | 60 |
881 | Brigg | Himsworth | Male | 63 | 1973-10-10 | NaN | Telecommunications | Mass Customer | N | Yes | 9 | 771 Union Crossing | 4570 | QLD | Australia | 6 | 882 | 0.535500 | 47 | 50 |
889 | Carr | Hopkynson | Male | 64 | 1971-10-18 | NaN | Manufacturing | Affluent Customer | N | No | 16 | 5990 Fairfield Pass | 2318 | NSW | Australia | 6 | 888 | 0.525000 | 49 | 50 |
899 | Penrod | Tomasicchio | Male | 5 | 1968-05-28 | NaN | Health | High Net Worth | N | No | 19 | 30 Harper Trail | 2318 | NSW | Australia | 9 | 899 | 0.510000 | 52 | 60 |
907 | Dru | Crellim | Female | 57 | 1963-03-04 | NaN | NaN | Mass Customer | N | No | 12 | 90 Morningstar Drive | 3030 | VIC | Australia | 7 | 904 | 0.500000 | 58 | 60 |
910 | Aleece | Feige | Female | 49 | 1975-09-16 | NaN | Manufacturing | Mass Customer | N | No | 18 | 2030 Anderson Lane | 2141 | NSW | Australia | 10 | 904 | 0.500000 | 45 | 50 |
914 | Launce | Gale | Male | 86 | 1939-01-15 | NaN | NaN | Mass Customer | N | No | 21 | 4 Fordem Avenue | 2777 | NSW | Australia | 9 | 913 | 0.499375 | 82 | 90 |
923 | Wilone | Champley | Female | 22 | 1983-11-06 | NaN | Manufacturing | High Net Worth | N | No | 17 | 9346 Lyons Point | 2077 | NSW | Australia | 10 | 924 | 0.488750 | 37 | 40 |
929 | Diane | Furman | Female | 67 | 1993-08-11 | NaN | Manufacturing | Affluent Customer | N | Yes | 13 | 6660 Riverside Circle | 3013 | VIC | Australia | 9 | 930 | 0.478125 | 27 | 30 |
952 | Candy | None | Female | 23 | 1977-12-08 | NaN | Financial Services | Mass Customer | N | No | 6 | 59252 Maryland Drive | 3500 | VIC | Australia | 3 | 951 | 0.450500 | 43 | 50 |
953 | Noami | Cokly | Female | 74 | 1962-09-17 | NaN | Manufacturing | Mass Customer | N | Yes | 15 | 2886 Buena Vista Terrace | 2038 | NSW | Australia | 11 | 954 | 0.450000 | 58 | 60 |
971 | Frieda | Tavinor | Female | 43 | 1999-03-04 | NaN | NaN | Affluent Customer | N | No | 10 | 7 Mallory Lane | 3064 | VIC | Australia | 6 | 972 | 0.430000 | 22 | 30 |
972 | Ellwood | Budden | Male | 82 | 1998-06-03 | NaN | Health | Mass Customer | N | Yes | 11 | 79907 Randy Center | 2192 | NSW | Australia | 10 | 972 | 0.430000 | 22 | 30 |
989 | Kellen | Pawelski | Female | 83 | 1945-07-26 | NaN | Manufacturing | High Net Worth | N | Yes | 11 | 125 Manufacturers Parkway | 2193 | NSW | Australia | 8 | 988 | 0.399500 | 75 | 80 |
105 rows × 20 columns
Since percentage of missing values for Job Title is 11%. We will replace null values with Missing.
new_cust['job_title'].fillna('Missing', inplace=True, axis=0)
new_cust['job_title'].isnull().sum()
0
Currently there are no missing values for Job Title Column.
new_cust[new_cust['job_industry_category'].isnull()]
first_name | last_name | gender | past_3_years_bike_related_purchases | DOB | job_title | job_industry_category | wealth_segment | deceased_indicator | owns_car | tenure | address | postcode | state | country | property_valuation | Rank | Value | Age | Age Group | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
22 | Otis | Ottey | Male | 26 | 1998-02-05 | Quality Engineer | NaN | Mass Customer | N | No | 3 | 1562 Merchant Street | 4744 | QLD | Australia | 4 | 23 | 1.500000 | 23 | 30 |
23 | Tabbatha | Averill | Female | 5 | 1977-12-17 | Quality Control Specialist | NaN | Affluent Customer | N | Yes | 13 | 663 8th Parkway | 2257 | NSW | Australia | 8 | 23 | 1.500000 | 43 | 50 |
33 | Mikel | McNess | Male | 71 | 1981-09-22 | Nurse | NaN | Mass Customer | N | No | 9 | 3 Pleasure Drive | 4122 | QLD | Australia | 9 | 32 | 1.453125 | 39 | 40 |
36 | Farlie | Petford | Male | 76 | 1968-03-25 | Recruiting Manager | NaN | High Net Worth | N | No | 13 | 2330 Butternut Trail | 2017 | NSW | Australia | 10 | 36 | 1.447656 | 53 | 60 |
43 | Corinna | Suggey | Female | 52 | 1966-09-18 | Design Engineer | NaN | Affluent Customer | N | No | 9 | 938 Ilene Road | 2761 | NSW | Australia | 8 | 44 | 1.421094 | 54 | 60 |
44 | Brooke | Arling | Male | 76 | 1961-12-05 | Missing | NaN | High Net Worth | N | No | 6 | 6 Melby Center | 3027 | VIC | Australia | 5 | 44 | 1.421094 | 59 | 60 |
47 | Jobina | Gobourn | Female | 85 | 1994-12-04 | VP Quality Control | NaN | High Net Worth | N | Yes | 14 | 18 Grim Road | 4305 | QLD | Australia | 4 | 46 | 1.407812 | 26 | 30 |
57 | Marylou | Kirkup | Female | 51 | 1972-10-31 | VP Product Management | NaN | Mass Customer | N | No | 14 | 76733 Sunbrook Terrace | 3196 | VIC | Australia | 9 | 57 | 1.375000 | 48 | 50 |
58 | Whittaker | None | Male | 64 | 1966-07-29 | Media Manager III | NaN | Mass Customer | N | Yes | 8 | 683 Florence Way | 3156 | VIC | Australia | 5 | 57 | 1.375000 | 54 | 60 |
69 | Vivienne | Crayden | Female | 82 | 1988-09-18 | Associate Professor | NaN | High Net Worth | N | Yes | 6 | 69 Algoma Center | 4173 | QLD | Australia | 7 | 68 | 1.354688 | 32 | 40 |
73 | Yancy | Clementet | Male | 5 | 1968-02-16 | Mechanical Systems Engineer | NaN | High Net Worth | N | No | 15 | 9 Union Center | 2147 | NSW | Australia | 9 | 72 | 1.350000 | 53 | 60 |
85 | Pietra | Buckleigh | Female | 9 | 1949-04-29 | Engineer III | NaN | High Net Worth | N | No | 13 | 387 Dixon Alley | 2024 | NSW | Australia | 10 | 85 | 1.325000 | 72 | 80 |
87 | Kahaleel | None | Male | 5 | 1942-11-01 | GIS Technical Architect | NaN | High Net Worth | N | No | 13 | 12 Arapahoe Park | 2035 | NSW | Australia | 12 | 88 | 1.314844 | 78 | 80 |
90 | Ludovico | Juster | Male | 93 | 1992-04-19 | Environmental Specialist | NaN | Affluent Customer | N | No | 15 | 1 Talisman Avenue | 2125 | NSW | Australia | 10 | 89 | 1.312500 | 29 | 30 |
93 | Levy | Abramamov | Male | 94 | 1952-09-21 | Teacher | NaN | Affluent Customer | N | Yes | 14 | 6776 Anderson Center | 4037 | QLD | Australia | 8 | 89 | 1.312500 | 68 | 70 |
94 | Nobe | McAughtry | Male | 25 | 1978-12-14 | Missing | NaN | Mass Customer | N | No | 12 | 1 Orin Hill | 4510 | QLD | Australia | 5 | 89 | 1.312500 | 42 | 50 |
108 | Aridatha | Sephton | Female | 95 | 1961-10-22 | Human Resources Assistant II | NaN | Mass Customer | N | No | 5 | 422 Forster Circle | 2340 | NSW | Australia | 1 | 104 | 1.287500 | 59 | 60 |
112 | David | Napoleon | Male | 72 | 1961-11-05 | Structural Engineer | NaN | High Net Worth | N | No | 14 | 69 Garrison Point | 2223 | NSW | Australia | 11 | 111 | 1.281250 | 59 | 60 |
121 | Alexander | Broadbent | Male | 57 | 1997-05-28 | Desktop Support Technician | NaN | Mass Customer | N | No | 9 | 265 Stephen Trail | 2209 | NSW | Australia | 10 | 120 | 1.262500 | 23 | 30 |
122 | Teddy | Lagadu | Female | 86 | 1969-07-20 | Design Engineer | NaN | High Net Worth | N | No | 6 | 2 Charing Cross Trail | 2759 | NSW | Australia | 8 | 120 | 1.262500 | 51 | 60 |
124 | Ludvig | Andren | Male | 44 | 1941-02-22 | Media Manager III | NaN | High Net Worth | N | Yes | 15 | 578 Waywood Circle | 4306 | QLD | Australia | 5 | 125 | 1.261719 | 80 | 90 |
131 | Farris | Skettles | Male | 38 | 1965-07-03 | Payment Adjustment Coordinator | NaN | Mass Customer | N | Yes | 13 | 49309 Redwing Lane | 3240 | VIC | Australia | 7 | 132 | 1.248438 | 55 | 60 |
132 | Sharline | Abyss | Female | 11 | 1960-03-18 | Missing | NaN | Mass Customer | N | Yes | 15 | 367 Bay Point | 4011 | QLD | Australia | 4 | 133 | 1.237500 | 61 | 70 |
135 | Padraig | Snel | Male | 89 | 1970-11-08 | Staff Accountant II | NaN | Mass Customer | N | No | 19 | 12683 Mifflin Point | 2114 | NSW | Australia | 7 | 133 | 1.237500 | 50 | 60 |
158 | Tedra | Goodbanne | Female | 4 | 1978-01-15 | Senior Quality Engineer | NaN | Mass Customer | N | Yes | 6 | 8 Debs Road | 3934 | VIC | Australia | 9 | 158 | 1.187500 | 43 | 50 |
159 | Roberto | Harme | Male | 27 | 1951-06-11 | Environmental Tech | NaN | High Net Worth | N | No | 10 | 101 Starling Pass | 2564 | NSW | Australia | 9 | 158 | 1.187500 | 69 | 70 |
163 | Fonsie | Levane | Male | 96 | 1951-07-10 | Account Representative III | NaN | High Net Worth | N | Yes | 19 | 83 Armistice Terrace | 4011 | QLD | Australia | 3 | 163 | 1.182031 | 69 | 70 |
164 | Emilie | Brody | Female | 3 | 1979-05-22 | Director of Sales | NaN | Mass Customer | N | Yes | 3 | 5388 Burrows Alley | 2073 | NSW | Australia | 11 | 163 | 1.182031 | 41 | 50 |
170 | Alvira | Coulman | Female | 42 | 1955-06-05 | Account Representative II | NaN | Affluent Customer | N | No | 14 | 823 Wayridge Trail | 2205 | NSW | Australia | 9 | 166 | 1.175000 | 65 | 70 |
176 | Devonne | Alderwick | Female | 79 | 1939-01-29 | Research Associate | NaN | High Net Worth | N | Yes | 9 | 534 Lien Lane | 3122 | VIC | Australia | 7 | 177 | 1.162500 | 82 | 90 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
814 | Maddalena | Hencke | Female | 61 | 1952-12-09 | Help Desk Operator | NaN | High Net Worth | N | No | 22 | 64037 Swallow Crossing | 4170 | QLD | Australia | 5 | 810 | 0.587500 | 68 | 70 |
815 | Rand | Winchcum | Male | 34 | 2000-04-10 | Software Consultant | NaN | Affluent Customer | N | No | 3 | 4594 Jackson Hill | 2146 | NSW | Australia | 7 | 810 | 0.587500 | 21 | 30 |
822 | Brod | Attrey | Male | 46 | 1966-11-05 | Budget/Accounting Analyst III | NaN | Mass Customer | N | Yes | 14 | 180 Lakewood Park | 2194 | NSW | Australia | 8 | 820 | 0.584375 | 54 | 60 |
826 | Herbert | Henryson | Male | 21 | 1995-10-10 | Marketing Manager | NaN | Mass Customer | N | No | 4 | 05123 Bobwhite Plaza | 2528 | NSW | Australia | 9 | 820 | 0.584375 | 25 | 30 |
827 | Cristie | Bence | Female | 49 | 2000-04-17 | Automation Specialist II | NaN | High Net Worth | N | No | 9 | 3413 Schmedeman Court | 4122 | QLD | Australia | 8 | 828 | 0.580000 | 21 | 30 |
837 | Monty | Thomazin | Male | 7 | 1951-09-16 | Quality Engineer | NaN | Mass Customer | N | Yes | 13 | 30738 Muir Avenue | 3105 | VIC | Australia | 10 | 838 | 0.573750 | 69 | 70 |
838 | Briano | Janowski | Male | 66 | 1994-07-17 | Analyst Programmer | NaN | Mass Customer | N | No | 7 | 3259 Eagan Parkway | 2066 | NSW | Australia | 8 | 838 | 0.573750 | 26 | 30 |
839 | Ginger | None | Male | 94 | 1939-02-19 | Human Resources Manager | NaN | Mass Customer | N | No | 11 | 160 Fremont Point | 2259 | NSW | Australia | 8 | 840 | 0.571094 | 82 | 90 |
840 | Logan | Colomb | Male | 74 | 1948-01-01 | Recruiter | NaN | Mass Customer | N | Yes | 19 | 266 Lakewood Terrace | 2761 | NSW | Australia | 8 | 840 | 0.571094 | 73 | 80 |
841 | Nichols | Devinn | Male | 47 | 1979-09-29 | Recruiter | NaN | Affluent Customer | N | No | 11 | 5280 Waxwing Point | 2071 | NSW | Australia | 12 | 842 | 0.570000 | 41 | 50 |
846 | Rosabelle | Godsmark | Female | 60 | 1995-10-19 | Executive Secretary | NaN | Mass Customer | N | Yes | 3 | 4871 Caliangt Hill | 4102 | QLD | Australia | 8 | 845 | 0.563125 | 25 | 30 |
867 | Mycah | Beaston | Male | 11 | 1961-07-31 | Environmental Specialist | NaN | High Net Worth | N | Yes | 12 | 2 Mandrake Street | 2221 | NSW | Australia | 11 | 865 | 0.550000 | 59 | 60 |
879 | Muffin | Bhar | Male | 44 | 1966-04-07 | Missing | NaN | Affluent Customer | N | No | 19 | 15 Weeping Birch Crossing | 2448 | NSW | Australia | 4 | 879 | 0.537500 | 55 | 60 |
882 | Judi | Cazereau | Female | 22 | 1997-03-03 | GIS Technical Architect | NaN | Affluent Customer | N | Yes | 13 | 22 Farmco Avenue | 3851 | VIC | Australia | 3 | 883 | 0.531250 | 24 | 30 |
893 | Jesse | Alflat | Male | 31 | 1984-09-01 | Executive Secretary | NaN | High Net Worth | N | No | 5 | 49 Northfield Drive | 2145 | NSW | Australia | 9 | 893 | 0.520625 | 36 | 40 |
900 | Pancho | Edis | Male | 1 | 1970-12-30 | Assistant Professor | NaN | Mass Customer | N | No | 13 | 64467 Pankratz Pass | 3023 | VIC | Australia | 7 | 899 | 0.510000 | 50 | 60 |
906 | Conway | Juarez | Male | 27 | 1967-03-02 | Help Desk Technician | NaN | Affluent Customer | N | No | 17 | 66904 American Ash Hill | 4814 | QLD | Australia | 5 | 904 | 0.500000 | 54 | 60 |
907 | Dru | Crellim | Female | 57 | 1963-03-04 | Missing | NaN | Mass Customer | N | No | 12 | 90 Morningstar Drive | 3030 | VIC | Australia | 7 | 904 | 0.500000 | 58 | 60 |
913 | Hildegarde | Bamb | Female | 16 | 1961-02-10 | Help Desk Operator | NaN | High Net Worth | N | Yes | 10 | 5070 Division Parkway | 3910 | VIC | Australia | 9 | 913 | 0.499375 | 60 | 70 |
914 | Launce | Gale | Male | 86 | 1939-01-15 | Missing | NaN | Mass Customer | N | No | 21 | 4 Fordem Avenue | 2777 | NSW | Australia | 9 | 913 | 0.499375 | 82 | 90 |
920 | Sheilakathryn | Huff | Female | 45 | 1958-05-15 | Assistant Manager | NaN | High Net Worth | N | Yes | 14 | 04 Miller Drive | 2477 | NSW | Australia | 6 | 921 | 0.490000 | 63 | 70 |
941 | Angele | Cadore | Female | 5 | 1954-09-06 | Chief Design Engineer | NaN | Mass Customer | N | Yes | 7 | 85894 Amoth Court | 4125 | QLD | Australia | 7 | 939 | 0.467500 | 66 | 70 |
950 | Liane | Abelevitz | Female | 26 | 1976-11-25 | Operator | NaN | Mass Customer | N | No | 3 | 85340 Hovde Way | 3153 | VIC | Australia | 7 | 951 | 0.450500 | 44 | 50 |
954 | Lyndell | Jereatt | Female | 14 | 1994-11-28 | Payment Adjustment Coordinator | NaN | High Net Worth | N | No | 13 | 58770 Monterey Plaza | 2122 | NSW | Australia | 12 | 954 | 0.450000 | 26 | 30 |
957 | Rhodie | Gaskall | Female | 83 | 1964-02-01 | VP Quality Control | NaN | Mass Customer | N | Yes | 9 | 251 Pierstorff Alley | 4170 | QLD | Australia | 9 | 956 | 0.446250 | 57 | 60 |
959 | Blondell | Dibdall | Female | 62 | 1967-01-03 | Programmer III | NaN | Mass Customer | N | No | 4 | 34 Bunting Pass | 3048 | VIC | Australia | 4 | 960 | 0.442000 | 54 | 60 |
971 | Frieda | Tavinor | Female | 43 | 1999-03-04 | Missing | NaN | Affluent Customer | N | No | 10 | 7 Mallory Lane | 3064 | VIC | Australia | 6 | 972 | 0.430000 | 22 | 30 |
975 | Amby | Bodega | Male | 63 | 1968-06-12 | Recruiter | NaN | Affluent Customer | N | Yes | 17 | 669 Declaration Street | 3810 | VIC | Australia | 6 | 974 | 0.425000 | 52 | 60 |
980 | Tyne | Anshell | Female | 71 | 1992-04-08 | Mechanical Systems Engineer | NaN | Mass Customer | N | Yes | 3 | 93 Sutherland Terrace | 2560 | NSW | Australia | 8 | 979 | 0.416500 | 29 | 30 |
983 | Augusta | Munns | Female | 5 | 1951-09-17 | Quality Control Specialist | NaN | Mass Customer | N | No | 21 | 607 Memorial Avenue | 2074 | NSW | Australia | 11 | 983 | 0.410000 | 69 | 70 |
165 rows × 20 columns
Since Percentage of missing Job Industry Category is 16%. We will replace null values with Missing.
new_cust['job_industry_category'].fillna('Missing', inplace=True, axis=0)
new_cust['job_industry_category'].isnull().sum()
0
Currently there are no Missing values for Job Industry Category column.
Finally there are no Missing Values in the dataset.
new_cust.isnull().sum()
first_name 0 last_name 0 gender 0 past_3_years_bike_related_purchases 0 DOB 0 job_title 0 job_industry_category 0 wealth_segment 0 deceased_indicator 0 owns_car 0 tenure 0 address 0 postcode 0 state 0 country 0 property_valuation 0 Rank 0 Value 0 Age 0 Age Group 0 dtype: int64
print("Total records after removing Missing Values: {}".format(new_cust.shape[0]))
Total records after removing Missing Values: 983
We will check whether there is inconsistent data / typo error data is present in the categorical columns.
The columns to be checked are 'gender', 'wealth_segment' ,'deceased_indicator', 'owns_car'
There is no inconsistent data in gender column.
new_cust['gender'].value_counts()
Female 513 Male 470 Name: gender, dtype: int64
There is no inconsistent data in wealth_segment column.
new_cust['wealth_segment'].value_counts()
Mass Customer 499 High Net Worth 249 Affluent Customer 235 Name: wealth_segment, dtype: int64
There is no inconsistent data in deceased_indicator column.
new_cust['deceased_indicator'].value_counts()
N 983 Name: deceased_indicator, dtype: int64
There is no inconsistent data in owns_car column.
new_cust['owns_car'].value_counts()
No 497 Yes 486 Name: owns_car, dtype: int64
There is no inconsistent data in state column.
new_cust['state'].value_counts()
NSW 499 VIC 258 QLD 226 Name: state, dtype: int64
There is no inconsistent data in country column.
new_cust['country'].value_counts()
Australia 983 Name: country, dtype: int64
There is no inconsistent data in postcode column.
new_cust[['postcode', 'state']].drop_duplicates().sort_values('state')
postcode | state | |
---|---|---|
164 | 2073 | NSW |
202 | 2300 | NSW |
616 | 2049 | NSW |
204 | 2429 | NSW |
615 | 2070 | NSW |
208 | 2144 | NSW |
213 | 2165 | NSW |
608 | 2477 | NSW |
216 | 2444 | NSW |
601 | 2103 | NSW |
222 | 2203 | NSW |
223 | 2446 | NSW |
599 | 2096 | NSW |
593 | 2753 | NSW |
198 | 2448 | NSW |
227 | 2099 | NSW |
231 | 2007 | NSW |
233 | 2011 | NSW |
588 | 2539 | NSW |
236 | 2281 | NSW |
237 | 2224 | NSW |
583 | 2574 | NSW |
580 | 2028 | NSW |
571 | 2258 | NSW |
566 | 2030 | NSW |
565 | 2142 | NSW |
564 | 2567 | NSW |
253 | 2166 | NSW |
558 | 2158 | NSW |
228 | 2430 | NSW |
... | ... | ... |
158 | 3934 | VIC |
153 | 3201 | VIC |
148 | 3977 | VIC |
684 | 3860 | VIC |
194 | 3195 | VIC |
195 | 3687 | VIC |
613 | 3782 | VIC |
212 | 3021 | VIC |
277 | 3804 | VIC |
275 | 3976 | VIC |
537 | 3028 | VIC |
538 | 3185 | VIC |
270 | 3199 | VIC |
544 | 3200 | VIC |
260 | 3103 | VIC |
556 | 3335 | VIC |
256 | 3206 | VIC |
139 | 3081 | VIC |
255 | 3031 | VIC |
567 | 3177 | VIC |
242 | 3004 | VIC |
579 | 3133 | VIC |
240 | 3170 | VIC |
584 | 3585 | VIC |
585 | 3677 | VIC |
234 | 3429 | VIC |
589 | 3037 | VIC |
604 | 3129 | VIC |
245 | 3134 | VIC |
341 | 3163 | VIC |
515 rows × 2 columns
There is no inconsistent data in address column.
new_cust[['address', 'postcode','state','country']].sort_values('address')
address | postcode | state | country | |
---|---|---|---|---|
721 | 0 Bay Drive | 2750 | NSW | Australia |
138 | 0 Dexter Parkway | 2380 | NSW | Australia |
624 | 0 Emmet Trail | 4128 | QLD | Australia |
300 | 0 Esker Avenue | 4019 | QLD | Australia |
685 | 0 Express Lane | 2142 | NSW | Australia |
546 | 0 Kipling Way | 2289 | NSW | Australia |
644 | 0 Larry Park | 3175 | VIC | Australia |
305 | 0 Mayfield Parkway | 4272 | QLD | Australia |
99 | 0 Meadow Ridge Street | 3173 | VIC | Australia |
469 | 0 Memorial Road | 3109 | VIC | Australia |
78 | 0 Mockingbird Plaza | 2212 | NSW | Australia |
524 | 0 Nelson Crossing | 3155 | VIC | Australia |
325 | 0 Stoughton Park | 3000 | VIC | Australia |
894 | 0 Summit Center | 4019 | QLD | Australia |
446 | 0 Union Parkway | 3142 | VIC | Australia |
501 | 0 Veith Way | 2009 | NSW | Australia |
413 | 00 Judy Terrace | 2035 | NSW | Australia |
757 | 00 Southridge Avenue | 2036 | NSW | Australia |
298 | 00003 Hoffman Pass | 2560 | NSW | Australia |
803 | 005 Kensington Street | 4165 | QLD | Australia |
456 | 005 Loeprich Way | 4680 | QLD | Australia |
642 | 01 Reindahl Circle | 4132 | QLD | Australia |
39 | 011 Northland Trail | 2160 | NSW | Australia |
333 | 01124 Dottie Lane | 3630 | VIC | Australia |
214 | 013 David Junction | 4211 | QLD | Australia |
315 | 016 Westport Park | 3073 | VIC | Australia |
179 | 0193 Northland Street | 4179 | QLD | Australia |
353 | 0197 Sachs Avenue | 2747 | NSW | Australia |
198 | 02 Hoffman Road | 2448 | NSW | Australia |
392 | 02 Roth Drive | 2022 | NSW | Australia |
... | ... | ... | ... | ... |
942 | 955 Burning Wood Way | 2478 | NSW | Australia |
585 | 95796 Mcbride Drive | 3677 | VIC | Australia |
795 | 96 Hermina Place | 4350 | QLD | Australia |
451 | 96 Rutledge Drive | 3064 | VIC | Australia |
72 | 9608 Heffernan Drive | 4068 | QLD | Australia |
712 | 96081 Lakewood Hill | 4650 | QLD | Australia |
65 | 9630 Cottonwood Avenue | 2168 | NSW | Australia |
454 | 9645 Moose Terrace | 2137 | NSW | Australia |
196 | 96515 Di Loreto Pass | 4109 | QLD | Australia |
102 | 966 Sunnyside Center | 2390 | NSW | Australia |
620 | 97 Merrick Center | 2460 | NSW | Australia |
395 | 97 Transport Plaza | 2097 | NSW | Australia |
125 | 9722 Northport Way | 3500 | VIC | Australia |
378 | 9736 Mitchell Pass | 3199 | VIC | Australia |
610 | 976 Roxbury Alley | 4157 | QLD | Australia |
633 | 98 Shoshone Road | 4207 | QLD | Australia |
156 | 98158 Alpine Point | 4212 | QLD | Australia |
192 | 98221 Pennsylvania Place | 2170 | NSW | Australia |
286 | 984 Del Sol Junction | 4659 | QLD | Australia |
761 | 98454 Dapin Park | 4556 | QLD | Australia |
482 | 98555 Victoria Hill | 2171 | NSW | Australia |
66 | 989 Graedel Terrace | 4208 | QLD | Australia |
960 | 99 Park Meadow Hill | 2570 | NSW | Australia |
488 | 99 Quincy Parkway | 3630 | VIC | Australia |
654 | 99 Sherman Parkway | 3083 | VIC | Australia |
308 | 99 Westend Court | 2287 | NSW | Australia |
336 | 990 Hoffman Avenue | 3029 | VIC | Australia |
796 | 99376 Namekagon Street | 3101 | VIC | Australia |
583 | 9940 Manley Drive | 2574 | NSW | Australia |
50 | 998 Gale Park | 3174 | VIC | Australia |
983 rows × 4 columns
There is no inconsistent data in tenure column. The distribution of tenure looks fine.
new_cust['tenure'].describe()
count 983.000000 mean 11.459817 std 5.006123 min 1.000000 25% 8.000000 50% 11.000000 75% 15.000000 max 22.000000 Name: tenure, dtype: float64
# Distributon of tenure
plt.figure(figsize=(15,8))
sns.distplot(new_cust['tenure'])
<matplotlib.axes._subplots.AxesSubplot at 0x245697f5b00>
We need to ensure that there is no duplication of records in the dataset. This may lead to error in data analysis due to poor data quality. If there are duplicate rows of data then we need to drop such records.
For checking for duplicate records we need to firstly remove the primary key column of the dataset then apply drop_duplicates() function provided by Python.
new_cust_dedupped = new_cust.drop_duplicates()
print("Number of records after removing customer_id (pk), duplicates : {}".format(new_cust_dedupped.shape[0]))
print("Number of records in original dataset : {}".format(new_cust.shape[0]))
Number of records after removing customer_id (pk), duplicates : 983 Number of records in original dataset : 983
Since both the numbers are same. There are no duplicate records in the dataset.
new_cust.to_csv('NewCustomerList_Cleaned.csv', index=False)