In this guided project, we'll work with a dataset of used cars from eBay Kleinanzeigen, a classifieds section of the German eBay website. Below is its datadictionary
The data dictionary provided with data is as follows:
# Importing pandas library.
import pandas as pd
# Opening the dataset.
autos = pd.read_csv('autos.csv',encoding='Latin-1')
autos.head()
# A neat feature of jupyter notebook is its ability to render the first few and last few values
# of any pandas object
autos
# Getting some details about the dataframe.
autos.info()
# Converting the columns names from camelcase into snakecase style.
# Copying the dataframe.
autos_copy = autos.copy()
columns = autos_copy.columns
# Dictionary to map old names to new names.
mapping_columns = {
'dateCrawled':'date_crawled',
'name':'name',
'seller':'seller',
'offerType':'offer_type',
'price':'price',
'abtest':'abtest',
'vehicleType':'vehicle_type',
'yearOfRegistration':'registration_year',
'gearbox':'gearbox',
'powerPS':'power_ps',
'model':'model',
'odometer':'odometer',
'monthOfRegistration':'registration_month',
'fuelType':'fuel_type',
'brand':'brand',
'notRepairedDamage':'unrepaired_damage',
'dateCreated':'ad_created',
'nrOfPictures':'nr_of_pictures',
'postalCode':'postal_code',
'lastSeen':'last_seen'}
# Updating the column names.
autos_copy.columns = columns.map(mapping_columns)
autos_copy.head()
I've make the following edits to columns names:
yearOfRegistration to registration_year
monthOfRegistration to registration_month
notRepairedDamage to unrepaired_damage
dateCreated to ad_created
The rest of the columnn names from camelcase to snakecase. I did that to make the data clearer, and the snakecase is ubiquitous in Python coding.
# Getting some statistics about the dataframe.
autos_copy.describe(include='all')
These are some importants remarks about the dataframe.
# autos_copy.info()
# autos_copy.loc[:,"offer_type"].value_counts(dropna=False)
# autos_copy.loc[:,"price"].value_counts(dropna=False).head(10)
# autos_copy.loc[:,"odometer"].value_counts(dropna=False).sort_values(ascending=False).head(10)
# autos_copy.loc[:,"nr_of_pictures"].value_counts(dropna=False).sort_values(ascending=False).head(10)
# autos_copy.loc[:,"seller"].value_counts(dropna=False).sort_values(ascending=False).head(10)
# converting and renaming the column "odometer".
autos_copy.loc[:,"odometer"] = autos_copy.loc[:,"odometer"].str.replace('km','').str.replace(',','')
autos_copy.loc[:,"odometer"] = autos_copy.loc[:,"odometer"].astype(float)
autos_copy.rename({"odometer": "odometer_km"}, axis=1, inplace=True)
autos_copy.head(2)
# converting the column "odometer" from str datatype to float.
autos_copy.loc[:,"price"] = autos_copy.loc[:,"price"].str.replace('$','').str.replace(',','')
autos_copy.loc[:,"price"] = autos_copy.loc[:,"price"].astype(float)
autos_copy.head(2)