In this guided project, we'll work with a dataset of used cars from eBay Kleinanzeigen, a classifieds section of the German eBay website.
The dataset was originally scraped and uploaded to Kaggle. A few modifications have been made from the original dataset that was uploaded to Kaggle:
The file and relevant libraries have been uploaded below
import pandas as pd
import numpy as np
autos = pd.read_csv('autos.csv', encoding = "Latin-1")
price
autos
autos.info()
autos.head()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 50000 entries, 0 to 49999 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 dateCrawled 50000 non-null object 1 name 50000 non-null object 2 seller 50000 non-null object 3 offerType 50000 non-null object 4 price 50000 non-null object 5 abtest 50000 non-null object 6 vehicleType 44905 non-null object 7 yearOfRegistration 50000 non-null int64 8 gearbox 47320 non-null object 9 powerPS 50000 non-null int64 10 model 47242 non-null object 11 odometer 50000 non-null object 12 monthOfRegistration 50000 non-null int64 13 fuelType 45518 non-null object 14 brand 50000 non-null object 15 notRepairedDamage 40171 non-null object 16 dateCreated 50000 non-null object 17 nrOfPictures 50000 non-null int64 18 postalCode 50000 non-null int64 19 lastSeen 50000 non-null object dtypes: int64(5), object(15) memory usage: 7.6+ MB
dateCrawled | name | seller | offerType | price | abtest | vehicleType | yearOfRegistration | gearbox | powerPS | model | odometer | monthOfRegistration | fuelType | brand | notRepairedDamage | dateCreated | nrOfPictures | postalCode | lastSeen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-26 17:47:46 | Peugeot_807_160_NAVTECH_ON_BOARD | privat | Angebot | $5,000 | control | bus | 2004 | manuell | 158 | andere | 150,000km | 3 | lpg | peugeot | nein | 2016-03-26 00:00:00 | 0 | 79588 | 2016-04-06 06:45:54 |
1 | 2016-04-04 13:38:56 | BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik | privat | Angebot | $8,500 | control | limousine | 1997 | automatik | 286 | 7er | 150,000km | 6 | benzin | bmw | nein | 2016-04-04 00:00:00 | 0 | 71034 | 2016-04-06 14:45:08 |
2 | 2016-03-26 18:57:24 | Volkswagen_Golf_1.6_United | privat | Angebot | $8,990 | test | limousine | 2009 | manuell | 102 | golf | 70,000km | 7 | benzin | volkswagen | nein | 2016-03-26 00:00:00 | 0 | 35394 | 2016-04-06 20:15:37 |
3 | 2016-03-12 16:58:10 | Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... | privat | Angebot | $4,350 | control | kleinwagen | 2007 | automatik | 71 | fortwo | 70,000km | 6 | benzin | smart | nein | 2016-03-12 00:00:00 | 0 | 33729 | 2016-03-15 03:16:28 |
4 | 2016-04-01 14:38:50 | Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... | privat | Angebot | $1,350 | test | kombi | 2003 | manuell | 0 | focus | 150,000km | 7 | benzin | ford | nein | 2016-04-01 00:00:00 | 0 | 39218 | 2016-04-01 14:38:50 |
autos.columns
Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest', 'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model', 'odometer', 'monthOfRegistration', 'fuelType', 'brand', 'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode', 'lastSeen'], dtype='object')
New_headings = ['date_crawled', 'name', 'seller', 'offer_type', 'price', 'abtest',
'vehicle_type', 'registration_year', 'gearbox', 'power_PS', 'model',
'odometer', 'registration_month', 'fuel_type', 'brand',
'unrepaired_damage', 'ad_created', 'nr_of_pictures', 'postal_code',
'last_seen']
autos.columns = New_headings
autos.head()
date_crawled | name | seller | offer_type | price | abtest | vehicle_type | registration_year | gearbox | power_PS | model | odometer | registration_month | fuel_type | brand | unrepaired_damage | ad_created | nr_of_pictures | postal_code | last_seen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-26 17:47:46 | Peugeot_807_160_NAVTECH_ON_BOARD | privat | Angebot | $5,000 | control | bus | 2004 | manuell | 158 | andere | 150,000km | 3 | lpg | peugeot | nein | 2016-03-26 00:00:00 | 0 | 79588 | 2016-04-06 06:45:54 |
1 | 2016-04-04 13:38:56 | BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik | privat | Angebot | $8,500 | control | limousine | 1997 | automatik | 286 | 7er | 150,000km | 6 | benzin | bmw | nein | 2016-04-04 00:00:00 | 0 | 71034 | 2016-04-06 14:45:08 |
2 | 2016-03-26 18:57:24 | Volkswagen_Golf_1.6_United | privat | Angebot | $8,990 | test | limousine | 2009 | manuell | 102 | golf | 70,000km | 7 | benzin | volkswagen | nein | 2016-03-26 00:00:00 | 0 | 35394 | 2016-04-06 20:15:37 |
3 | 2016-03-12 16:58:10 | Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... | privat | Angebot | $4,350 | control | kleinwagen | 2007 | automatik | 71 | fortwo | 70,000km | 6 | benzin | smart | nein | 2016-03-12 00:00:00 | 0 | 33729 | 2016-03-15 03:16:28 |
4 | 2016-04-01 14:38:50 | Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... | privat | Angebot | $1,350 | test | kombi | 2003 | manuell | 0 | focus | 150,000km | 7 | benzin | ford | nein | 2016-04-01 00:00:00 | 0 | 39218 | 2016-04-01 14:38:50 |
autos.describe(include = 'all')
date_crawled | name | seller | offer_type | price | abtest | vehicle_type | registration_year | gearbox | power_PS | model | odometer | registration_month | fuel_type | brand | unrepaired_damage | ad_created | nr_of_pictures | postal_code | last_seen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 50000 | 50000 | 50000 | 50000 | 50000 | 50000 | 44905 | 50000.000000 | 47320 | 50000.000000 | 47242 | 50000 | 50000.000000 | 45518 | 50000 | 40171 | 50000 | 50000.0 | 50000.000000 | 50000 |
unique | 48213 | 38754 | 2 | 2 | 2357 | 2 | 8 | NaN | 2 | NaN | 245 | 13 | NaN | 7 | 40 | 2 | 76 | NaN | NaN | 39481 |
top | 2016-03-12 16:06:22 | Ford_Fiesta | privat | Angebot | $0 | test | limousine | NaN | manuell | NaN | golf | 150,000km | NaN | benzin | volkswagen | nein | 2016-04-03 00:00:00 | NaN | NaN | 2016-04-07 06:17:27 |
freq | 3 | 78 | 49999 | 49999 | 1421 | 25756 | 12859 | NaN | 36993 | NaN | 4024 | 32424 | NaN | 30107 | 10687 | 35232 | 1946 | NaN | NaN | 8 |
mean | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2005.073280 | NaN | 116.355920 | NaN | NaN | 5.723360 | NaN | NaN | NaN | NaN | 0.0 | 50813.627300 | NaN |
std | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 105.712813 | NaN | 209.216627 | NaN | NaN | 3.711984 | NaN | NaN | NaN | NaN | 0.0 | 25779.747957 | NaN |
min | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1000.000000 | NaN | 0.000000 | NaN | NaN | 0.000000 | NaN | NaN | NaN | NaN | 0.0 | 1067.000000 | NaN |
25% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1999.000000 | NaN | 70.000000 | NaN | NaN | 3.000000 | NaN | NaN | NaN | NaN | 0.0 | 30451.000000 | NaN |
50% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2003.000000 | NaN | 105.000000 | NaN | NaN | 6.000000 | NaN | NaN | NaN | NaN | 0.0 | 49577.000000 | NaN |
75% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2008.000000 | NaN | 150.000000 | NaN | NaN | 9.000000 | NaN | NaN | NaN | NaN | 0.0 | 71540.000000 | NaN |
max | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 9999.000000 | NaN | 17700.000000 | NaN | NaN | 12.000000 | NaN | NaN | NaN | NaN | 0.0 | 99998.000000 | NaN |
seller
and offer_type
columns each contain only 1 unique value, so we will consider dropping themprice
and odometer
, are objects, so we will consider converting them to numeric values by stripping out the non-numeric characters, then converting to int64 or float64 and finally renaming the columnautos.drop('nr_of_pictures', axis=1)
date_crawled | name | seller | offer_type | price | abtest | vehicle_type | registration_year | gearbox | power_PS | model | odometer | registration_month | fuel_type | brand | unrepaired_damage | ad_created | postal_code | last_seen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-26 17:47:46 | Peugeot_807_160_NAVTECH_ON_BOARD | privat | Angebot | $5,000 | control | bus | 2004 | manuell | 158 | andere | 150,000km | 3 | lpg | peugeot | nein | 2016-03-26 00:00:00 | 79588 | 2016-04-06 06:45:54 |
1 | 2016-04-04 13:38:56 | BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik | privat | Angebot | $8,500 | control | limousine | 1997 | automatik | 286 | 7er | 150,000km | 6 | benzin | bmw | nein | 2016-04-04 00:00:00 | 71034 | 2016-04-06 14:45:08 |
2 | 2016-03-26 18:57:24 | Volkswagen_Golf_1.6_United | privat | Angebot | $8,990 | test | limousine | 2009 | manuell | 102 | golf | 70,000km | 7 | benzin | volkswagen | nein | 2016-03-26 00:00:00 | 35394 | 2016-04-06 20:15:37 |
3 | 2016-03-12 16:58:10 | Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... | privat | Angebot | $4,350 | control | kleinwagen | 2007 | automatik | 71 | fortwo | 70,000km | 6 | benzin | smart | nein | 2016-03-12 00:00:00 | 33729 | 2016-03-15 03:16:28 |
4 | 2016-04-01 14:38:50 | Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... | privat | Angebot | $1,350 | test | kombi | 2003 | manuell | 0 | focus | 150,000km | 7 | benzin | ford | nein | 2016-04-01 00:00:00 | 39218 | 2016-04-01 14:38:50 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
49995 | 2016-03-27 14:38:19 | Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon | privat | Angebot | $24,900 | control | limousine | 2011 | automatik | 239 | q5 | 100,000km | 1 | diesel | audi | nein | 2016-03-27 00:00:00 | 82131 | 2016-04-01 13:47:40 |
49996 | 2016-03-28 10:50:25 | Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+... | privat | Angebot | $1,980 | control | cabrio | 1996 | manuell | 75 | astra | 150,000km | 5 | benzin | opel | nein | 2016-03-28 00:00:00 | 44807 | 2016-04-02 14:18:02 |
49997 | 2016-04-02 14:44:48 | Fiat_500_C_1.2_Dualogic_Lounge | privat | Angebot | $13,200 | test | cabrio | 2014 | automatik | 69 | 500 | 5,000km | 11 | benzin | fiat | nein | 2016-04-02 00:00:00 | 73430 | 2016-04-04 11:47:27 |
49998 | 2016-03-08 19:25:42 | Audi_A3_2.0_TDI_Sportback_Ambition | privat | Angebot | $22,900 | control | kombi | 2013 | manuell | 150 | a3 | 40,000km | 11 | diesel | audi | nein | 2016-03-08 00:00:00 | 35683 | 2016-04-05 16:45:07 |
49999 | 2016-03-14 00:42:12 | Opel_Vectra_1.6_16V | privat | Angebot | $1,250 | control | limousine | 1996 | manuell | 101 | vectra | 150,000km | 1 | benzin | opel | nein | 2016-03-13 00:00:00 | 45897 | 2016-04-06 21:18:48 |
50000 rows × 19 columns
autos.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 50000 entries, 0 to 49999 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date_crawled 50000 non-null object 1 name 50000 non-null object 2 seller 50000 non-null object 3 offer_type 50000 non-null object 4 price 50000 non-null object 5 abtest 50000 non-null object 6 vehicle_type 44905 non-null object 7 registration_year 50000 non-null int64 8 gearbox 47320 non-null object 9 power_PS 50000 non-null int64 10 model 47242 non-null object 11 odometer 50000 non-null object 12 registration_month 50000 non-null int64 13 fuel_type 45518 non-null object 14 brand 50000 non-null object 15 unrepaired_damage 40171 non-null object 16 ad_created 50000 non-null object 17 nr_of_pictures 50000 non-null int64 18 postal_code 50000 non-null int64 19 last_seen 50000 non-null object dtypes: int64(5), object(15) memory usage: 7.6+ MB
autos.head()
date_crawled | name | seller | offer_type | price | abtest | vehicle_type | registration_year | gearbox | power_PS | model | odometer | registration_month | fuel_type | brand | unrepaired_damage | ad_created | nr_of_pictures | postal_code | last_seen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-26 17:47:46 | Peugeot_807_160_NAVTECH_ON_BOARD | privat | Angebot | $5,000 | control | bus | 2004 | manuell | 158 | andere | 150,000km | 3 | lpg | peugeot | nein | 2016-03-26 00:00:00 | 0 | 79588 | 2016-04-06 06:45:54 |
1 | 2016-04-04 13:38:56 | BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik | privat | Angebot | $8,500 | control | limousine | 1997 | automatik | 286 | 7er | 150,000km | 6 | benzin | bmw | nein | 2016-04-04 00:00:00 | 0 | 71034 | 2016-04-06 14:45:08 |
2 | 2016-03-26 18:57:24 | Volkswagen_Golf_1.6_United | privat | Angebot | $8,990 | test | limousine | 2009 | manuell | 102 | golf | 70,000km | 7 | benzin | volkswagen | nein | 2016-03-26 00:00:00 | 0 | 35394 | 2016-04-06 20:15:37 |
3 | 2016-03-12 16:58:10 | Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... | privat | Angebot | $4,350 | control | kleinwagen | 2007 | automatik | 71 | fortwo | 70,000km | 6 | benzin | smart | nein | 2016-03-12 00:00:00 | 0 | 33729 | 2016-03-15 03:16:28 |
4 | 2016-04-01 14:38:50 | Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... | privat | Angebot | $1,350 | test | kombi | 2003 | manuell | 0 | focus | 150,000km | 7 | benzin | ford | nein | 2016-04-01 00:00:00 | 0 | 39218 | 2016-04-01 14:38:50 |
autos.loc[:,"odometer"] = autos.loc[:,"odometer"].str.replace("km","").str.replace(",","")
autos.loc[:,"odometer"] = autos.loc[:,"odometer"].astype(int)
autos.rename({"odometer":"odometer_km"}, axis = 1, inplace = True)
autos.loc[:,"price"] = autos.loc[:,"price"].str.replace("$","").str.replace(",","")
autos.loc[:,"price"] = autos.loc[:,"price"].astype(int)
autos.head()
date_crawled | name | seller | offer_type | price | abtest | vehicle_type | registration_year | gearbox | power_PS | model | odometer_km | registration_month | fuel_type | brand | unrepaired_damage | ad_created | nr_of_pictures | postal_code | last_seen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-26 17:47:46 | Peugeot_807_160_NAVTECH_ON_BOARD | privat | Angebot | 5000 | control | bus | 2004 | manuell | 158 | andere | 150000 | 3 | lpg | peugeot | nein | 2016-03-26 00:00:00 | 0 | 79588 | 2016-04-06 06:45:54 |
1 | 2016-04-04 13:38:56 | BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik | privat | Angebot | 8500 | control | limousine | 1997 | automatik | 286 | 7er | 150000 | 6 | benzin | bmw | nein | 2016-04-04 00:00:00 | 0 | 71034 | 2016-04-06 14:45:08 |
2 | 2016-03-26 18:57:24 | Volkswagen_Golf_1.6_United | privat | Angebot | 8990 | test | limousine | 2009 | manuell | 102 | golf | 70000 | 7 | benzin | volkswagen | nein | 2016-03-26 00:00:00 | 0 | 35394 | 2016-04-06 20:15:37 |
3 | 2016-03-12 16:58:10 | Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... | privat | Angebot | 4350 | control | kleinwagen | 2007 | automatik | 71 | fortwo | 70000 | 6 | benzin | smart | nein | 2016-03-12 00:00:00 | 0 | 33729 | 2016-03-15 03:16:28 |
4 | 2016-04-01 14:38:50 | Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... | privat | Angebot | 1350 | test | kombi | 2003 | manuell | 0 | focus | 150000 | 7 | benzin | ford | nein | 2016-04-01 00:00:00 | 0 | 39218 | 2016-04-01 14:38:50 |
odometer_km
and price
columns.autos.loc[:, 'odometer_km'].shape
(50000,)
autos.loc[:, 'odometer_km'].describe()
count 50000.000000 mean 125732.700000 std 40042.211706 min 5000.000000 25% 125000.000000 50% 150000.000000 75% 150000.000000 max 150000.000000 Name: odometer_km, dtype: float64
autos.loc[:, 'odometer_km'].value_counts().sort_index(ascending = False)
150000 32424 125000 5170 100000 2169 90000 1757 80000 1436 70000 1230 60000 1164 50000 1027 40000 819 30000 789 20000 784 10000 264 5000 967 Name: odometer_km, dtype: int64
autos.loc[:, 'price'].shape
(50000,)
autos.loc[:, 'price'].describe()
count 5.000000e+04 mean 9.840044e+03 std 4.811044e+05 min 0.000000e+00 25% 1.100000e+03 50% 2.950000e+03 75% 7.200000e+03 max 1.000000e+08 Name: price, dtype: float64
autos.loc[:, 'price'].value_counts().sort_index(ascending = False)
99999999 1 27322222 1 12345678 3 11111111 2 10000000 1 ... 5 2 3 1 2 3 1 156 0 1421 Name: price, Length: 2357, dtype: int64
autos.loc[autos.loc[:, 'price'] == 99999999, 'price'] = np.nan
autos.loc[autos.loc[:, 'price'] < 50, 'price'] = np.nan
autos = autos.dropna()
# checking the new largest and smallest values in the price series
autos.loc[:, 'price'].value_counts().sort_index(ascending = False)
12345678.0 1 11111111.0 1 999999.0 1 999990.0 1 350000.0 1 .. 70.0 1 66.0 1 65.0 1 60.0 2 50.0 15 Name: price, Length: 2146, dtype: int64
autos.shape
(34550, 20)
autos[['date_crawled','ad_created','last_seen']][0:5]
date_crawled | ad_created | last_seen | |
---|---|---|---|
0 | 2016-03-26 17:47:46 | 2016-03-26 00:00:00 | 2016-04-06 06:45:54 |
1 | 2016-04-04 13:38:56 | 2016-04-04 00:00:00 | 2016-04-06 14:45:08 |
2 | 2016-03-26 18:57:24 | 2016-03-26 00:00:00 | 2016-04-06 20:15:37 |
3 | 2016-03-12 16:58:10 | 2016-03-12 00:00:00 | 2016-03-15 03:16:28 |
4 | 2016-04-01 14:38:50 | 2016-04-01 00:00:00 | 2016-04-01 14:38:50 |
autos.loc[:, 'date_crawled'] = autos.loc[:, 'date_crawled'].str[:10]
autos.loc[:, 'ad_created'] = autos.loc[:, 'ad_created'].str[:10]
autos.loc[:, 'last_seen'] = autos.loc[:, 'last_seen'].str[:10]
autos[['date_crawled','ad_created','last_seen']][0:5]
date_crawled | ad_created | last_seen | |
---|---|---|---|
0 | 2016-03-26 | 2016-03-26 | 2016-04-06 |
1 | 2016-04-04 | 2016-04-04 | 2016-04-06 |
2 | 2016-03-26 | 2016-03-26 | 2016-04-06 |
3 | 2016-03-12 | 2016-03-12 | 2016-03-15 |
4 | 2016-04-01 | 2016-04-01 | 2016-04-01 |
autos.loc[:, 'registration_year'].value_counts().sort_index(ascending = False)
2018 1 2016 40 2015 344 2014 603 2013 731 ... 1950 1 1941 1 1937 1 1934 1 1931 1 Name: registration_year, Length: 68, dtype: int64
date_crawled
,ad_created
or last_seen
cells. This may be because I dropped any "problem rows" when removing Null values.autos = autos.loc[autos.loc[:, 'registration_year'].between(1950,2016),:]
autos.loc[:, 'date_crawled'] = autos.loc[:, 'date_crawled'].str.replace("-","").astype(int)
autos.loc[:, 'ad_created'] = autos.loc[:, 'ad_created'].str.replace("-","").astype(int)
autos.loc[:, 'last_seen'] = autos.loc[:, 'last_seen'].str.replace("-","").astype(int)
brands = autos.loc[:,'brand'].value_counts(normalize = True, ascending = False)
significant_brands = brands.loc[brands > 0.05].index
significant_brands
Index(['volkswagen', 'bmw', 'mercedes_benz', 'opel', 'audi', 'ford'], dtype='object')
mean_price = {}
for b in significant_brands:
mean = autos.loc[autos.loc[:,'brand']==b,'price'].mean()
mean_price[b] = mean
mean_price
{'volkswagen': 6445.090143218197, 'bmw': 9230.636386200049, 'mercedes_benz': 9543.794084507042, 'opel': 6953.577796706417, 'audi': 10581.772583701832, 'ford': 9577.086712683347}
mean_mileage = {}
for b in significant_brands:
mean = autos.loc[autos.loc[:,'brand']==b,'odometer_km'].mean()
mean_mileage[b] = mean
mean_mileage
{'volkswagen': 126191.37882617243, 'bmw': 131522.71034996276, 'mercedes_benz': 129816.9014084507, 'opel': 127535.49119818285, 'audi': 126366.0770688566, 'ford': 122918.46419327006}
price_series = pd.Series(mean_price)
mileage_series = pd.Series(mean_mileage)
df = pd.DataFrame(price_series, columns =['mean_price'])
df.loc[:,'mean_mileage'] = mileage_series
df
mean_price | mean_mileage | |
---|---|---|
volkswagen | 6445.090143 | 126191.378826 |
bmw | 9230.636386 | 131522.710350 |
mercedes_benz | 9543.794085 | 129816.901408 |
opel | 6953.577797 | 127535.491198 |
audi | 10581.772584 | 126366.077069 |
ford | 9577.086713 | 122918.464193 |
df.loc[:,'price_per_mile'] = df.iloc[:,0] / df.iloc[:,1]
df.sort_values(by = 'mean_price', ascending = False)
mean_price | mean_mileage | price_per_mile | |
---|---|---|---|
audi | 10581.772584 | 126366.077069 | 0.083739 |
ford | 9577.086713 | 122918.464193 | 0.077914 |
mercedes_benz | 9543.794085 | 129816.901408 | 0.073517 |
bmw | 9230.636386 | 131522.710350 | 0.070183 |
opel | 6953.577797 | 127535.491198 | 0.054523 |
volkswagen | 6445.090143 | 126191.378826 | 0.051074 |
mean_price
and mean_mileage
Find the most common brand/model combinations
Split the odometer_km into groups, and use aggregation to see if average prices follows any patterns based on the milage.
How much cheaper are cars with damage than their non-damaged counterparts?
autos.loc[:,'brand_model_comb'] = autos.loc[:,'brand'].str.cat(autos.loc[:,'model'])
autos.loc[:,'brand_model_comb'].value_counts().head(3)
volkswagengolf 2748 bmw3er 2077 volkswagenpolo 1129 Name: brand_model_comb, dtype: int64
_0km_50km_mean = autos.loc[autos.loc[:,'odometer_km'].between(0,50000), 'price'].mean()
_50km_100km_mean = autos.loc[autos.loc[:,'odometer_km'].between(50000,100000), 'price'].mean()
_100km_150km_mean = autos.loc[autos.loc[:,'odometer_km'].between(100000,150000), 'price'].mean()
print(_0km_50km_mean)
print(_50km_100km_mean)
print(_100km_150km_mean)
20224.724452554743 12408.266312921267 5439.197750877979
autos.head()
date_crawled | name | seller | offer_type | price | abtest | vehicle_type | registration_year | gearbox | power_PS | ... | odometer_km | registration_month | fuel_type | brand | unrepaired_damage | ad_created | nr_of_pictures | postal_code | last_seen | brand_model_comb | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 20160326 | Peugeot_807_160_NAVTECH_ON_BOARD | privat | Angebot | 5000.0 | control | bus | 2004 | manuell | 158 | ... | 150000 | 3 | lpg | peugeot | nein | 20160326 | 0 | 79588 | 20160406 | peugeotandere |
1 | 20160404 | BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik | privat | Angebot | 8500.0 | control | limousine | 1997 | automatik | 286 | ... | 150000 | 6 | benzin | bmw | nein | 20160404 | 0 | 71034 | 20160406 | bmw7er |
2 | 20160326 | Volkswagen_Golf_1.6_United | privat | Angebot | 8990.0 | test | limousine | 2009 | manuell | 102 | ... | 70000 | 7 | benzin | volkswagen | nein | 20160326 | 0 | 35394 | 20160406 | volkswagengolf |
3 | 20160312 | Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... | privat | Angebot | 4350.0 | control | kleinwagen | 2007 | automatik | 71 | ... | 70000 | 6 | benzin | smart | nein | 20160312 | 0 | 33729 | 20160315 | smartfortwo |
4 | 20160401 | Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... | privat | Angebot | 1350.0 | test | kombi | 2003 | manuell | 0 | ... | 150000 | 7 | benzin | ford | nein | 20160401 | 0 | 39218 | 20160401 | fordfocus |
5 rows × 21 columns
autos.loc[:,'unrepaired_damage'].unique()
array(['nein', 'ja'], dtype=object)
No_unrepaired_damage = autos.loc[autos.loc[:,'unrepaired_damage'] == 'nein', 'price'].mean()
unrepaired_damage = autos.loc[autos.loc[:,'unrepaired_damage'] == 'ja', 'price'].mean()
No_unrepaired_damage - unrepaired_damage
5421.324586578532