** Introduction **
The aim of this project is to clean the data and analyze the included used car listings, from eBay Kleinanzeigen, a classifieds section of the German eBay website.
import pandas as pd
autos = pd.read_csv('autos.csv', encoding = 'Latin -1')
autos
dateCrawled | name | seller | offerType | price | abtest | vehicleType | yearOfRegistration | gearbox | powerPS | model | odometer | monthOfRegistration | fuelType | brand | notRepairedDamage | dateCreated | nrOfPictures | postalCode | lastSeen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-26 17:47:46 | Peugeot_807_160_NAVTECH_ON_BOARD | privat | Angebot | $5,000 | control | bus | 2004 | manuell | 158 | andere | 150,000km | 3 | lpg | peugeot | nein | 2016-03-26 00:00:00 | 0 | 79588 | 2016-04-06 06:45:54 |
1 | 2016-04-04 13:38:56 | BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik | privat | Angebot | $8,500 | control | limousine | 1997 | automatik | 286 | 7er | 150,000km | 6 | benzin | bmw | nein | 2016-04-04 00:00:00 | 0 | 71034 | 2016-04-06 14:45:08 |
2 | 2016-03-26 18:57:24 | Volkswagen_Golf_1.6_United | privat | Angebot | $8,990 | test | limousine | 2009 | manuell | 102 | golf | 70,000km | 7 | benzin | volkswagen | nein | 2016-03-26 00:00:00 | 0 | 35394 | 2016-04-06 20:15:37 |
3 | 2016-03-12 16:58:10 | Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... | privat | Angebot | $4,350 | control | kleinwagen | 2007 | automatik | 71 | fortwo | 70,000km | 6 | benzin | smart | nein | 2016-03-12 00:00:00 | 0 | 33729 | 2016-03-15 03:16:28 |
4 | 2016-04-01 14:38:50 | Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... | privat | Angebot | $1,350 | test | kombi | 2003 | manuell | 0 | focus | 150,000km | 7 | benzin | ford | nein | 2016-04-01 00:00:00 | 0 | 39218 | 2016-04-01 14:38:50 |
5 | 2016-03-21 13:47:45 | Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto... | privat | Angebot | $7,900 | test | bus | 2006 | automatik | 150 | voyager | 150,000km | 4 | diesel | chrysler | NaN | 2016-03-21 00:00:00 | 0 | 22962 | 2016-04-06 09:45:21 |
6 | 2016-03-20 17:55:21 | VW_Golf_III_GT_Special_Electronic_Green_Metall... | privat | Angebot | $300 | test | limousine | 1995 | manuell | 90 | golf | 150,000km | 8 | benzin | volkswagen | NaN | 2016-03-20 00:00:00 | 0 | 31535 | 2016-03-23 02:48:59 |
7 | 2016-03-16 18:55:19 | Golf_IV_1.9_TDI_90PS | privat | Angebot | $1,990 | control | limousine | 1998 | manuell | 90 | golf | 150,000km | 12 | diesel | volkswagen | nein | 2016-03-16 00:00:00 | 0 | 53474 | 2016-04-07 03:17:32 |
8 | 2016-03-22 16:51:34 | Seat_Arosa | privat | Angebot | $250 | test | NaN | 2000 | manuell | 0 | arosa | 150,000km | 10 | NaN | seat | nein | 2016-03-22 00:00:00 | 0 | 7426 | 2016-03-26 18:18:10 |
9 | 2016-03-16 13:47:02 | Renault_Megane_Scenic_1.6e_RT_Klimaanlage | privat | Angebot | $590 | control | bus | 1997 | manuell | 90 | megane | 150,000km | 7 | benzin | renault | nein | 2016-03-16 00:00:00 | 0 | 15749 | 2016-04-06 10:46:35 |
10 | 2016-03-15 01:41:36 | VW_Golf_Tuning_in_siber/grau | privat | Angebot | $999 | test | NaN | 2017 | manuell | 90 | NaN | 150,000km | 4 | benzin | volkswagen | nein | 2016-03-14 00:00:00 | 0 | 86157 | 2016-04-07 03:16:21 |
11 | 2016-03-16 18:45:34 | Mercedes_A140_Motorschaden | privat | Angebot | $350 | control | NaN | 2000 | NaN | 0 | NaN | 150,000km | 0 | benzin | mercedes_benz | NaN | 2016-03-16 00:00:00 | 0 | 17498 | 2016-03-16 18:45:34 |
12 | 2016-03-31 19:48:22 | Smart_smart_fortwo_coupe_softouch_pure_MHD_Pan... | privat | Angebot | $5,299 | control | kleinwagen | 2010 | automatik | 71 | fortwo | 50,000km | 9 | benzin | smart | nein | 2016-03-31 00:00:00 | 0 | 34590 | 2016-04-06 14:17:52 |
13 | 2016-03-23 10:48:32 | Audi_A3_1.6_tuning | privat | Angebot | $1,350 | control | limousine | 1999 | manuell | 101 | a3 | 150,000km | 11 | benzin | audi | nein | 2016-03-23 00:00:00 | 0 | 12043 | 2016-04-01 14:17:13 |
14 | 2016-03-23 11:50:46 | Renault_Clio_3__Dynamique_1.2__16_V;_viele_Ver... | privat | Angebot | $3,999 | test | kleinwagen | 2007 | manuell | 75 | clio | 150,000km | 9 | benzin | renault | NaN | 2016-03-23 00:00:00 | 0 | 81737 | 2016-04-01 15:46:47 |
15 | 2016-04-01 12:06:20 | Corvette_C3_Coupe_T_Top_Crossfire_Injection | privat | Angebot | $18,900 | test | coupe | 1982 | automatik | 203 | NaN | 80,000km | 6 | benzin | sonstige_autos | nein | 2016-04-01 00:00:00 | 0 | 61276 | 2016-04-02 21:10:48 |
16 | 2016-03-16 14:59:02 | Opel_Vectra_B_Kombi | privat | Angebot | $350 | test | kombi | 1999 | manuell | 101 | vectra | 150,000km | 5 | benzin | opel | nein | 2016-03-16 00:00:00 | 0 | 57299 | 2016-03-18 05:29:37 |
17 | 2016-03-29 11:46:22 | Volkswagen_Scirocco_2_G60 | privat | Angebot | $5,500 | test | coupe | 1990 | manuell | 205 | scirocco | 150,000km | 6 | benzin | volkswagen | nein | 2016-03-29 00:00:00 | 0 | 74821 | 2016-04-05 20:46:26 |
18 | 2016-03-26 19:57:44 | Verkaufen_mein_bmw_e36_320_i_touring | privat | Angebot | $300 | control | bus | 1995 | manuell | 150 | 3er | 150,000km | 0 | benzin | bmw | NaN | 2016-03-26 00:00:00 | 0 | 54329 | 2016-04-02 12:16:41 |
19 | 2016-03-17 13:36:21 | mazda_tribute_2.0_mit_gas_und_tuev_neu_2018 | privat | Angebot | $4,150 | control | suv | 2004 | manuell | 124 | andere | 150,000km | 2 | lpg | mazda | nein | 2016-03-17 00:00:00 | 0 | 40878 | 2016-03-17 14:45:58 |
20 | 2016-03-05 19:57:31 | Audi_A4_Avant_1.9_TDI_*6_Gang*AHK*Klimatronik*... | privat | Angebot | $3,500 | test | kombi | 2003 | manuell | 131 | a4 | 150,000km | 5 | diesel | audi | NaN | 2016-03-05 00:00:00 | 0 | 53913 | 2016-03-07 05:46:46 |
21 | 2016-03-06 19:07:10 | Porsche_911_Carrera_4S_Cabrio | privat | Angebot | $41,500 | test | cabrio | 2004 | manuell | 320 | 911 | 150,000km | 4 | benzin | porsche | nein | 2016-03-06 00:00:00 | 0 | 65428 | 2016-04-05 23:46:19 |
22 | 2016-03-28 20:50:54 | MINI_Cooper_S_Cabrio | privat | Angebot | $25,450 | control | cabrio | 2015 | manuell | 184 | cooper | 10,000km | 1 | benzin | mini | nein | 2016-03-28 00:00:00 | 0 | 44789 | 2016-04-01 06:45:30 |
23 | 2016-03-10 19:55:34 | Peugeot_Boxer_2_2_HDi_120_Ps_9_Sitzer_inkl_Klima | privat | Angebot | $7,999 | control | bus | 2010 | manuell | 120 | NaN | 150,000km | 2 | diesel | peugeot | nein | 2016-03-10 00:00:00 | 0 | 30900 | 2016-03-17 08:45:17 |
24 | 2016-04-03 11:57:02 | BMW_535i_xDrive_Sport_Aut. | privat | Angebot | $48,500 | control | limousine | 2014 | automatik | 306 | 5er | 30,000km | 12 | benzin | bmw | nein | 2016-04-03 00:00:00 | 0 | 22547 | 2016-04-07 13:16:50 |
25 | 2016-03-21 21:56:18 | Ford_escort_kombi_an_bastler_mit_ghia_ausstattung | privat | Angebot | $90 | control | kombi | 1996 | manuell | 116 | NaN | 150,000km | 4 | benzin | ford | ja | 2016-03-21 00:00:00 | 0 | 27574 | 2016-04-01 05:16:49 |
26 | 2016-04-03 22:46:28 | Volkswagen_Polo_Fox | privat | Angebot | $777 | control | kleinwagen | 1992 | manuell | 54 | polo | 125,000km | 2 | benzin | volkswagen | nein | 2016-04-03 00:00:00 | 0 | 38110 | 2016-04-05 23:46:48 |
27 | 2016-03-27 18:45:01 | Hat_einer_Ahnung_mit_Ford_Galaxy_HILFE | privat | Angebot | $0 | control | NaN | 2005 | NaN | 0 | NaN | 150,000km | 0 | NaN | ford | NaN | 2016-03-27 00:00:00 | 0 | 66701 | 2016-03-27 18:45:01 |
28 | 2016-03-19 21:56:19 | MINI_Cooper_D | privat | Angebot | $5,250 | control | kleinwagen | 2007 | manuell | 110 | cooper | 150,000km | 7 | diesel | mini | ja | 2016-03-19 00:00:00 | 0 | 15745 | 2016-04-07 14:58:48 |
29 | 2016-04-02 12:45:44 | Mercedes_Benz_E_320_T_CDI_Avantgarde_DPF7_Sitz... | privat | Angebot | $4,999 | test | kombi | 2004 | automatik | 204 | e_klasse | 150,000km | 10 | diesel | mercedes_benz | nein | 2016-04-02 00:00:00 | 0 | 47638 | 2016-04-02 12:45:44 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
49970 | 2016-03-21 22:47:37 | c4_Grand_Picasso_mit_Automatik_Leder_Navi_Temp... | privat | Angebot | $15,800 | control | bus | 2010 | automatik | 136 | c4 | 60,000km | 4 | diesel | citroen | nein | 2016-03-21 00:00:00 | 0 | 14947 | 2016-04-07 04:17:34 |
49971 | 2016-03-29 14:54:12 | W.Lupo_1.0 | privat | Angebot | $950 | test | kleinwagen | 2001 | manuell | 50 | lupo | 150,000km | 4 | benzin | volkswagen | nein | 2016-03-29 00:00:00 | 0 | 65197 | 2016-03-29 20:41:51 |
49972 | 2016-03-26 22:25:23 | Mercedes_Benz_Vito_115_CDI_Extralang_Aut. | privat | Angebot | $3,300 | control | bus | 2004 | automatik | 150 | vito | 150,000km | 10 | diesel | mercedes_benz | ja | 2016-03-26 00:00:00 | 0 | 65326 | 2016-03-28 11:28:18 |
49973 | 2016-03-27 05:32:39 | Mercedes_Benz_SLK_200_Kompressor | privat | Angebot | $6,000 | control | cabrio | 2004 | manuell | 163 | slk | 150,000km | 11 | benzin | mercedes_benz | nein | 2016-03-27 00:00:00 | 0 | 53567 | 2016-03-27 08:25:24 |
49974 | 2016-03-20 10:52:31 | Golf_1_Cabrio_Tuev_Neu_viele_Extras_alles_eing... | privat | Angebot | $0 | control | cabrio | 1983 | manuell | 70 | golf | 150,000km | 2 | benzin | volkswagen | nein | 2016-03-20 00:00:00 | 0 | 8209 | 2016-03-27 19:48:16 |
49975 | 2016-03-27 20:51:39 | Honda_Jazz_1.3_DSi_i_VTEC_IMA_CVT_Comfort | privat | Angebot | $9,700 | control | kleinwagen | 2012 | automatik | 88 | jazz | 100,000km | 11 | hybrid | honda | nein | 2016-03-27 00:00:00 | 0 | 84385 | 2016-04-05 19:45:34 |
49976 | 2016-03-19 18:56:05 | Audi_80_Avant_2.6_E__Vollausstattung!!_Einziga... | privat | Angebot | $5,900 | test | kombi | 1992 | automatik | 150 | 80 | 150,000km | 12 | benzin | audi | nein | 2016-03-19 00:00:00 | 0 | 36100 | 2016-04-07 06:16:44 |
49977 | 2016-03-31 18:37:18 | Mercedes_Benz_C200_Cdi_W203 | privat | Angebot | $5,500 | control | limousine | 2003 | manuell | 116 | c_klasse | 150,000km | 2 | diesel | mercedes_benz | nein | 2016-03-31 00:00:00 | 0 | 33739 | 2016-04-06 12:16:11 |
49978 | 2016-04-04 10:37:14 | Mercedes_Benz_E_200_Classic | privat | Angebot | $900 | control | limousine | 1996 | automatik | 136 | e_klasse | 150,000km | 9 | benzin | mercedes_benz | ja | 2016-04-04 00:00:00 | 0 | 24405 | 2016-04-06 12:44:20 |
49979 | 2016-03-20 18:38:40 | Volkswagen_Polo_1.6_TDI_Style | privat | Angebot | $11,000 | test | kleinwagen | 2011 | manuell | 90 | polo | 70,000km | 11 | diesel | volkswagen | nein | 2016-03-20 00:00:00 | 0 | 48455 | 2016-04-07 01:45:12 |
49980 | 2016-03-12 10:55:54 | Ford_Escort_Turnier_16V | privat | Angebot | $400 | control | kombi | 1995 | manuell | 105 | escort | 125,000km | 3 | benzin | ford | NaN | 2016-03-12 00:00:00 | 0 | 56218 | 2016-04-06 17:16:49 |
49981 | 2016-03-15 09:38:21 | Opel_Astra_Kombi_mit_Anhaengerkupplung | privat | Angebot | $2,000 | control | kombi | 1998 | manuell | 115 | astra | 150,000km | 12 | benzin | opel | nein | 2016-03-15 00:00:00 | 0 | 86859 | 2016-04-05 17:21:46 |
49982 | 2016-03-29 18:51:08 | Skoda_Fabia_4_Tuerer_Bj:2004__85.000Tkm | privat | Angebot | $1,950 | control | kleinwagen | 2004 | manuell | 0 | fabia | 90,000km | 7 | benzin | skoda | NaN | 2016-03-29 00:00:00 | 0 | 45884 | 2016-03-29 18:51:08 |
49983 | 2016-03-06 12:43:04 | Ford_focus_99 | privat | Angebot | $600 | test | kleinwagen | 1999 | manuell | 101 | focus | 150,000km | 4 | benzin | ford | NaN | 2016-03-06 00:00:00 | 0 | 52477 | 2016-03-09 06:16:08 |
49984 | 2016-03-31 22:48:48 | Student_sucht_ein__Anfaengerauto___ab_2000_BJ_... | privat | Angebot | $0 | test | NaN | 2000 | NaN | 0 | NaN | 150,000km | 0 | NaN | sonstige_autos | NaN | 2016-03-31 00:00:00 | 0 | 12103 | 2016-04-02 19:44:53 |
49985 | 2016-04-02 16:38:23 | Verkaufe_meinen_vw_vento! | privat | Angebot | $1,000 | control | NaN | 1995 | automatik | 0 | NaN | 150,000km | 0 | benzin | volkswagen | NaN | 2016-04-02 00:00:00 | 0 | 30900 | 2016-04-06 15:17:52 |
49986 | 2016-04-04 20:46:02 | Chrysler_300C_3.0_CRD_DPF_Automatik_Voll_Ausst... | privat | Angebot | $15,900 | control | limousine | 2010 | automatik | 218 | 300c | 125,000km | 11 | diesel | chrysler | nein | 2016-04-04 00:00:00 | 0 | 73527 | 2016-04-06 23:16:00 |
49987 | 2016-03-22 20:47:27 | Audi_A3_Limousine_2.0_TDI_DPF_Ambition__NAVI__... | privat | Angebot | $21,990 | control | limousine | 2013 | manuell | 150 | a3 | 50,000km | 11 | diesel | audi | nein | 2016-03-22 00:00:00 | 0 | 94362 | 2016-03-26 22:46:06 |
49988 | 2016-03-28 19:49:51 | BMW_330_Ci | privat | Angebot | $9,550 | control | coupe | 2001 | manuell | 231 | 3er | 150,000km | 10 | benzin | bmw | nein | 2016-03-28 00:00:00 | 0 | 83646 | 2016-04-07 02:17:40 |
49989 | 2016-03-11 19:50:37 | VW_Polo_zum_Ausschlachten_oder_Wiederaufbau | privat | Angebot | $150 | test | kleinwagen | 1997 | manuell | 0 | polo | 150,000km | 5 | benzin | volkswagen | ja | 2016-03-11 00:00:00 | 0 | 21244 | 2016-03-12 10:17:55 |
49990 | 2016-03-21 19:54:19 | Mercedes_Benz_A_200__BlueEFFICIENCY__Urban | privat | Angebot | $17,500 | test | limousine | 2012 | manuell | 156 | a_klasse | 30,000km | 12 | benzin | mercedes_benz | nein | 2016-03-21 00:00:00 | 0 | 58239 | 2016-04-06 22:46:57 |
49991 | 2016-03-06 15:25:19 | Kleinwagen | privat | Angebot | $500 | control | NaN | 2016 | manuell | 0 | twingo | 150,000km | 0 | benzin | renault | NaN | 2016-03-06 00:00:00 | 0 | 61350 | 2016-03-06 18:24:19 |
49992 | 2016-03-10 19:37:38 | Fiat_Grande_Punto_1.4_T_Jet_16V_Sport | privat | Angebot | $4,800 | control | kleinwagen | 2009 | manuell | 120 | andere | 125,000km | 9 | lpg | fiat | nein | 2016-03-10 00:00:00 | 0 | 68642 | 2016-03-13 01:44:51 |
49993 | 2016-03-15 18:47:35 | Audi_A3__1_8l__Silber;_schoenes_Fahrzeug | privat | Angebot | $1,650 | control | kleinwagen | 1997 | manuell | 0 | NaN | 150,000km | 7 | benzin | audi | NaN | 2016-03-15 00:00:00 | 0 | 65203 | 2016-04-06 19:46:53 |
49994 | 2016-03-22 17:36:42 | Audi_A6__S6__Avant_4.2_quattro_eventuell_Tausc... | privat | Angebot | $5,000 | control | kombi | 2001 | automatik | 299 | a6 | 150,000km | 1 | benzin | audi | nein | 2016-03-22 00:00:00 | 0 | 46537 | 2016-04-06 08:16:39 |
49995 | 2016-03-27 14:38:19 | Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon | privat | Angebot | $24,900 | control | limousine | 2011 | automatik | 239 | q5 | 100,000km | 1 | diesel | audi | nein | 2016-03-27 00:00:00 | 0 | 82131 | 2016-04-01 13:47:40 |
49996 | 2016-03-28 10:50:25 | Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+... | privat | Angebot | $1,980 | control | cabrio | 1996 | manuell | 75 | astra | 150,000km | 5 | benzin | opel | nein | 2016-03-28 00:00:00 | 0 | 44807 | 2016-04-02 14:18:02 |
49997 | 2016-04-02 14:44:48 | Fiat_500_C_1.2_Dualogic_Lounge | privat | Angebot | $13,200 | test | cabrio | 2014 | automatik | 69 | 500 | 5,000km | 11 | benzin | fiat | nein | 2016-04-02 00:00:00 | 0 | 73430 | 2016-04-04 11:47:27 |
49998 | 2016-03-08 19:25:42 | Audi_A3_2.0_TDI_Sportback_Ambition | privat | Angebot | $22,900 | control | kombi | 2013 | manuell | 150 | a3 | 40,000km | 11 | diesel | audi | nein | 2016-03-08 00:00:00 | 0 | 35683 | 2016-04-05 16:45:07 |
49999 | 2016-03-14 00:42:12 | Opel_Vectra_1.6_16V | privat | Angebot | $1,250 | control | limousine | 1996 | manuell | 101 | vectra | 150,000km | 1 | benzin | opel | nein | 2016-03-13 00:00:00 | 0 | 45897 | 2016-04-06 21:18:48 |
50000 rows × 20 columns
print(autos.info())
print(autos.head())
<class 'pandas.core.frame.DataFrame'> RangeIndex: 50000 entries, 0 to 49999 Data columns (total 20 columns): dateCrawled 50000 non-null object name 50000 non-null object seller 50000 non-null object offerType 50000 non-null object price 50000 non-null object abtest 50000 non-null object vehicleType 44905 non-null object yearOfRegistration 50000 non-null int64 gearbox 47320 non-null object powerPS 50000 non-null int64 model 47242 non-null object odometer 50000 non-null object monthOfRegistration 50000 non-null int64 fuelType 45518 non-null object brand 50000 non-null object notRepairedDamage 40171 non-null object dateCreated 50000 non-null object nrOfPictures 50000 non-null int64 postalCode 50000 non-null int64 lastSeen 50000 non-null object dtypes: int64(5), object(15) memory usage: 7.6+ MB None dateCrawled name \ 0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD 1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik 2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United 3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... 4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... seller offerType price abtest vehicleType yearOfRegistration \ 0 privat Angebot $5,000 control bus 2004 1 privat Angebot $8,500 control limousine 1997 2 privat Angebot $8,990 test limousine 2009 3 privat Angebot $4,350 control kleinwagen 2007 4 privat Angebot $1,350 test kombi 2003 gearbox powerPS model odometer monthOfRegistration fuelType \ 0 manuell 158 andere 150,000km 3 lpg 1 automatik 286 7er 150,000km 6 benzin 2 manuell 102 golf 70,000km 7 benzin 3 automatik 71 fortwo 70,000km 6 benzin 4 manuell 0 focus 150,000km 7 benzin brand notRepairedDamage dateCreated nrOfPictures \ 0 peugeot nein 2016-03-26 00:00:00 0 1 bmw nein 2016-04-04 00:00:00 0 2 volkswagen nein 2016-03-26 00:00:00 0 3 smart nein 2016-03-12 00:00:00 0 4 ford nein 2016-04-01 00:00:00 0 postalCode lastSeen 0 79588 2016-04-06 06:45:54 1 71034 2016-04-06 14:45:08 2 35394 2016-04-06 20:15:37 3 33729 2016-03-15 03:16:28 4 39218 2016-04-01 14:38:50
The autos dataset contains twenty columns, most of which are strings. The dataset must be cleaned and organised for further analysis. Various data cleaning processes will be done on the dataset including; renaming of columns (CamelCase), converting of columns into numeric data, and changing words in column series.
An analysis of all columns which have null values will be done.
name_array = autos.columns
print(name_array)
Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest', 'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model', 'odometer', 'monthOfRegistration', 'fuelType', 'brand', 'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode', 'lastSeen'], dtype='object')
autos.columns
Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest', 'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model', 'odometer', 'monthOfRegistration', 'fuelType', 'brand', 'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode', 'lastSeen'], dtype='object')
autos.columns = ['date_crawled', 'name', 'seller', 'offer_type', 'price',
'ab_test', 'vehicle_type', 'registration_year', 'gearbox',
'power_ps', 'model', 'odometer', 'registration_month',
'fuel_type', 'brand', 'unrepaired_damage', 'ad_created',
'nr_of_pictures', 'postal_code', 'last_seen']
print(autos.head())
date_crawled name \ 0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD 1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik 2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United 3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... 4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... seller offer_type price ab_test vehicle_type registration_year \ 0 privat Angebot $5,000 control bus 2004 1 privat Angebot $8,500 control limousine 1997 2 privat Angebot $8,990 test limousine 2009 3 privat Angebot $4,350 control kleinwagen 2007 4 privat Angebot $1,350 test kombi 2003 gearbox power_ps model odometer registration_month fuel_type \ 0 manuell 158 andere 150,000km 3 lpg 1 automatik 286 7er 150,000km 6 benzin 2 manuell 102 golf 70,000km 7 benzin 3 automatik 71 fortwo 70,000km 6 benzin 4 manuell 0 focus 150,000km 7 benzin brand unrepaired_damage ad_created nr_of_pictures \ 0 peugeot nein 2016-03-26 00:00:00 0 1 bmw nein 2016-04-04 00:00:00 0 2 volkswagen nein 2016-03-26 00:00:00 0 3 smart nein 2016-03-12 00:00:00 0 4 ford nein 2016-04-01 00:00:00 0 postal_code last_seen 0 79588 2016-04-06 06:45:54 1 71034 2016-04-06 14:45:08 2 35394 2016-04-06 20:15:37 3 33729 2016-03-15 03:16:28 4 39218 2016-04-01 14:38:50
In the above code we renamed the column names to the more common naming convention, SnakeCase. With this naming convention the column names are more readable and accessable for data analysis.
autos.describe(include='all')
date_crawled | name | seller | offer_type | price | ab_test | vehicle_type | registration_year | gearbox | power_ps | model | odometer | registration_month | fuel_type | brand | unrepaired_damage | ad_created | nr_of_pictures | postal_code | last_seen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 50000 | 50000 | 50000 | 50000 | 50000 | 50000 | 44905 | 50000.000000 | 47320 | 50000.000000 | 47242 | 50000 | 50000.000000 | 45518 | 50000 | 40171 | 50000 | 50000.0 | 50000.000000 | 50000 |
unique | 48213 | 38754 | 2 | 2 | 2357 | 2 | 8 | NaN | 2 | NaN | 245 | 13 | NaN | 7 | 40 | 2 | 76 | NaN | NaN | 39481 |
top | 2016-03-19 17:36:18 | Ford_Fiesta | privat | Angebot | $0 | test | limousine | NaN | manuell | NaN | golf | 150,000km | NaN | benzin | volkswagen | nein | 2016-04-03 00:00:00 | NaN | NaN | 2016-04-07 06:17:27 |
freq | 3 | 78 | 49999 | 49999 | 1421 | 25756 | 12859 | NaN | 36993 | NaN | 4024 | 32424 | NaN | 30107 | 10687 | 35232 | 1946 | NaN | NaN | 8 |
mean | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2005.073280 | NaN | 116.355920 | NaN | NaN | 5.723360 | NaN | NaN | NaN | NaN | 0.0 | 50813.627300 | NaN |
std | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 105.712813 | NaN | 209.216627 | NaN | NaN | 3.711984 | NaN | NaN | NaN | NaN | 0.0 | 25779.747957 | NaN |
min | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1000.000000 | NaN | 0.000000 | NaN | NaN | 0.000000 | NaN | NaN | NaN | NaN | 0.0 | 1067.000000 | NaN |
25% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1999.000000 | NaN | 70.000000 | NaN | NaN | 3.000000 | NaN | NaN | NaN | NaN | 0.0 | 30451.000000 | NaN |
50% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2003.000000 | NaN | 105.000000 | NaN | NaN | 6.000000 | NaN | NaN | NaN | NaN | 0.0 | 49577.000000 | NaN |
75% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2008.000000 | NaN | 150.000000 | NaN | NaN | 9.000000 | NaN | NaN | NaN | NaN | 0.0 | 71540.000000 | NaN |
max | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 9999.000000 | NaN | 17700.000000 | NaN | NaN | 12.000000 | NaN | NaN | NaN | NaN | 0.0 | 99998.000000 | NaN |
In the dataset there is column ["nr_of_pictures"] which can be eliminated completely since value is always 0. The subsequent columns, ["name", "fuel_type", "gearbox", "seller", "offer_type", "vehicle_type"], must be further investigated and changed. Two columns, ["price", "odometer"] have to be converted to integer.
In the below code we're cleaning both the ["price", "odometer"] columns and turning them into integer format. We'll also be renaming the ["odometer"] column.
autos["odometer"] = autos["odometer"].str.replace("km", "")
autos["odometer"] = autos["odometer"].str.replace(",", "")
autos["odometer"] = autos["odometer"].astype(int)
autos.rename({"odometer":"odometer_km"}, axis=1, inplace=True)
autos["price"] = autos["price"].str.replace("$", "")
autos["price"] = autos["price"].str.replace(",", "")
autos["price"] = autos["price"].astype(int)
autos.head()
date_crawled | name | seller | offer_type | price | ab_test | vehicle_type | registration_year | gearbox | power_ps | model | odometer_km | registration_month | fuel_type | brand | unrepaired_damage | ad_created | nr_of_pictures | postal_code | last_seen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-26 17:47:46 | Peugeot_807_160_NAVTECH_ON_BOARD | privat | Angebot | 5000 | control | bus | 2004 | manuell | 158 | andere | 150000 | 3 | lpg | peugeot | nein | 2016-03-26 00:00:00 | 0 | 79588 | 2016-04-06 06:45:54 |
1 | 2016-04-04 13:38:56 | BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik | privat | Angebot | 8500 | control | limousine | 1997 | automatik | 286 | 7er | 150000 | 6 | benzin | bmw | nein | 2016-04-04 00:00:00 | 0 | 71034 | 2016-04-06 14:45:08 |
2 | 2016-03-26 18:57:24 | Volkswagen_Golf_1.6_United | privat | Angebot | 8990 | test | limousine | 2009 | manuell | 102 | golf | 70000 | 7 | benzin | volkswagen | nein | 2016-03-26 00:00:00 | 0 | 35394 | 2016-04-06 20:15:37 |
3 | 2016-03-12 16:58:10 | Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... | privat | Angebot | 4350 | control | kleinwagen | 2007 | automatik | 71 | fortwo | 70000 | 6 | benzin | smart | nein | 2016-03-12 00:00:00 | 0 | 33729 | 2016-03-15 03:16:28 |
4 | 2016-04-01 14:38:50 | Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... | privat | Angebot | 1350 | test | kombi | 2003 | manuell | 0 | focus | 150000 | 7 | benzin | ford | nein | 2016-04-01 00:00:00 | 0 | 39218 | 2016-04-01 14:38:50 |
In the below columns we'll be analysing further the ["price", "odometer_km"] columns
min_price = autos["price"].min()
max_price = autos["price"].max()
min_odometer = autos["odometer_km"].min()
max_odometer = autos["odometer_km"].max()
print(min_price, max_price)
print(min_odometer, max_odometer)
0 99999999 5000 150000
price_uni_values = autos["price"].unique().shape
odometer_uni_values = autos["odometer_km"].unique().shape
print(price_uni_values)
print(odometer_uni_values)
(2357,) (13,)
autos["odometer_km"].describe()
count 50000.000000 mean 125732.700000 std 40042.211706 min 5000.000000 25% 125000.000000 50% 150000.000000 75% 150000.000000 max 150000.000000 Name: odometer_km, dtype: float64
autos["price"].describe()
count 5.000000e+04 mean 9.840044e+03 std 4.811044e+05 min 0.000000e+00 25% 1.100000e+03 50% 2.950000e+03 75% 7.200000e+03 max 1.000000e+08 Name: price, dtype: float64
outlier_high = outlier_price["price"].value_counts().sort_index(ascending = False).head(10)
outlier_low = outlier_price["price"].value_counts().sort_index(ascending = True).head(10)
print(outlier_high)
print(outlier_low)
99999999 1 27322222 1 12345678 3 11111111 2 10000000 1 3890000 1 1300000 1 1234566 1 999999 2 999990 1 Name: price, dtype: int64 151990 1 155000 1 163500 1 163991 1 169000 1 169999 1 175000 1 180000 1 190000 1 194000 1 Name: price, dtype: int64
autos_new = autos[autos["price"].between(194000, 3890000)]
print(autos.head(5))
date_crawled name \ 0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD 1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik 2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United 3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... 4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... seller offer_type price ab_test vehicle_type registration_year \ 0 privat Angebot 5000 control bus 2004 1 privat Angebot 8500 control limousine 1997 2 privat Angebot 8990 test limousine 2009 3 privat Angebot 4350 control kleinwagen 2007 4 privat Angebot 1350 test kombi 2003 gearbox power_ps model odometer_km registration_month fuel_type \ 0 manuell 158 andere 150000 3 lpg 1 automatik 286 7er 150000 6 benzin 2 manuell 102 golf 70000 7 benzin 3 automatik 71 fortwo 70000 6 benzin 4 manuell 0 focus 150000 7 benzin brand unrepaired_damage ad_created nr_of_pictures \ 0 peugeot nein 2016-03-26 00:00:00 0 1 bmw nein 2016-04-04 00:00:00 0 2 volkswagen nein 2016-03-26 00:00:00 0 3 smart nein 2016-03-12 00:00:00 0 4 ford nein 2016-04-01 00:00:00 0 postal_code last_seen 0 79588 2016-04-06 06:45:54 1 71034 2016-04-06 14:45:08 2 35394 2016-04-06 20:15:37 3 33729 2016-03-15 03:16:28 4 39218 2016-04-01 14:38:50
In the above code we removed outliers where autos["price"] is higher than $3,890,000. In the autos["odometer_km"] column there doesn't seem to be any outliers.
In the below code we'll be analysing the ["date_crawled", "ad_created", "last_seen"] columns.
autos["date_crawled"].value_counts(normalize=True, dropna=False).sort_index()
2016-03-05 14:06:30 0.00002 2016-03-05 14:06:40 0.00002 2016-03-05 14:07:04 0.00002 2016-03-05 14:07:08 0.00002 2016-03-05 14:07:21 0.00002 2016-03-05 14:07:26 0.00002 2016-03-05 14:07:40 0.00002 2016-03-05 14:07:45 0.00002 2016-03-05 14:08:00 0.00004 2016-03-05 14:08:05 0.00004 2016-03-05 14:08:27 0.00002 2016-03-05 14:08:42 0.00002 2016-03-05 14:09:02 0.00004 2016-03-05 14:09:05 0.00002 2016-03-05 14:09:20 0.00002 2016-03-05 14:09:22 0.00002 2016-03-05 14:09:38 0.00002 2016-03-05 14:09:46 0.00002 2016-03-05 14:09:56 0.00002 2016-03-05 14:09:57 0.00002 2016-03-05 14:09:58 0.00004 2016-03-05 14:10:18 0.00002 2016-03-05 14:10:20 0.00002 2016-03-05 14:10:46 0.00002 2016-03-05 14:11:03 0.00002 2016-03-05 14:11:05 0.00002 2016-03-05 14:11:14 0.00002 2016-03-05 14:11:15 0.00002 2016-03-05 14:11:25 0.00002 2016-03-05 14:11:40 0.00002 ... 2016-04-07 10:36:19 0.00002 2016-04-07 10:36:21 0.00002 2016-04-07 10:36:24 0.00002 2016-04-07 10:36:25 0.00002 2016-04-07 10:36:35 0.00002 2016-04-07 10:36:36 0.00002 2016-04-07 10:36:37 0.00004 2016-04-07 11:06:33 0.00002 2016-04-07 11:36:19 0.00002 2016-04-07 11:36:23 0.00002 2016-04-07 11:36:24 0.00002 2016-04-07 11:36:25 0.00002 2016-04-07 11:36:34 0.00004 2016-04-07 11:36:35 0.00002 2016-04-07 12:06:19 0.00002 2016-04-07 12:06:23 0.00002 2016-04-07 12:25:34 0.00002 2016-04-07 12:25:35 0.00002 2016-04-07 13:06:18 0.00002 2016-04-07 13:25:39 0.00002 2016-04-07 13:36:19 0.00002 2016-04-07 13:36:20 0.00002 2016-04-07 13:36:37 0.00002 2016-04-07 13:36:38 0.00002 2016-04-07 14:07:04 0.00002 2016-04-07 14:30:09 0.00002 2016-04-07 14:30:26 0.00002 2016-04-07 14:36:44 0.00002 2016-04-07 14:36:55 0.00002 2016-04-07 14:36:56 0.00002 Name: date_crawled, Length: 48213, dtype: float64
In the ["date_crawled"] we can see that most of the transactions have been done in March 2016 and just a few in April 2016. Values are also common in each column as they all have the same percantage.
autos["ad_created"].value_counts(normalize=True, dropna=False).sort_index()
2015-06-11 00:00:00 0.00002 2015-08-10 00:00:00 0.00002 2015-09-09 00:00:00 0.00002 2015-11-10 00:00:00 0.00002 2015-12-05 00:00:00 0.00002 2015-12-30 00:00:00 0.00002 2016-01-03 00:00:00 0.00002 2016-01-07 00:00:00 0.00002 2016-01-10 00:00:00 0.00004 2016-01-13 00:00:00 0.00002 2016-01-14 00:00:00 0.00002 2016-01-16 00:00:00 0.00002 2016-01-22 00:00:00 0.00002 2016-01-27 00:00:00 0.00006 2016-01-29 00:00:00 0.00002 2016-02-01 00:00:00 0.00002 2016-02-02 00:00:00 0.00004 2016-02-05 00:00:00 0.00004 2016-02-07 00:00:00 0.00002 2016-02-08 00:00:00 0.00002 2016-02-09 00:00:00 0.00004 2016-02-11 00:00:00 0.00002 2016-02-12 00:00:00 0.00006 2016-02-14 00:00:00 0.00004 2016-02-16 00:00:00 0.00002 2016-02-17 00:00:00 0.00002 2016-02-18 00:00:00 0.00004 2016-02-19 00:00:00 0.00006 2016-02-20 00:00:00 0.00004 2016-02-21 00:00:00 0.00006 ... 2016-03-09 00:00:00 0.03324 2016-03-10 00:00:00 0.03186 2016-03-11 00:00:00 0.03278 2016-03-12 00:00:00 0.03662 2016-03-13 00:00:00 0.01692 2016-03-14 00:00:00 0.03522 2016-03-15 00:00:00 0.03374 2016-03-16 00:00:00 0.03000 2016-03-17 00:00:00 0.03120 2016-03-18 00:00:00 0.01372 2016-03-19 00:00:00 0.03384 2016-03-20 00:00:00 0.03786 2016-03-21 00:00:00 0.03772 2016-03-22 00:00:00 0.03280 2016-03-23 00:00:00 0.03218 2016-03-24 00:00:00 0.02908 2016-03-25 00:00:00 0.03188 2016-03-26 00:00:00 0.03256 2016-03-27 00:00:00 0.03090 2016-03-28 00:00:00 0.03496 2016-03-29 00:00:00 0.03414 2016-03-30 00:00:00 0.03344 2016-03-31 00:00:00 0.03192 2016-04-01 00:00:00 0.03380 2016-04-02 00:00:00 0.03508 2016-04-03 00:00:00 0.03892 2016-04-04 00:00:00 0.03688 2016-04-05 00:00:00 0.01184 2016-04-06 00:00:00 0.00326 2016-04-07 00:00:00 0.00128 Name: ad_created, Length: 76, dtype: float64
Same as the prior code the majority of transactions are taking place in March 2016 and April 2016. Values, unlike in the ["date_crawled"] column aren't all equal.
autos["last_seen"].value_counts(normalize=True, dropna=False).sort_index()
2016-03-05 14:45:46 0.00002 2016-03-05 14:46:02 0.00002 2016-03-05 14:49:34 0.00002 2016-03-05 15:16:11 0.00002 2016-03-05 15:16:47 0.00002 2016-03-05 15:28:10 0.00002 2016-03-05 15:41:30 0.00002 2016-03-05 15:45:43 0.00002 2016-03-05 15:47:38 0.00002 2016-03-05 15:47:44 0.00002 2016-03-05 16:45:57 0.00002 2016-03-05 16:47:28 0.00002 2016-03-05 17:15:45 0.00002 2016-03-05 17:16:12 0.00002 2016-03-05 17:16:14 0.00002 2016-03-05 17:16:23 0.00002 2016-03-05 17:17:02 0.00002 2016-03-05 17:39:19 0.00002 2016-03-05 17:40:14 0.00002 2016-03-05 17:44:50 0.00002 2016-03-05 17:44:54 0.00002 2016-03-05 17:46:01 0.00002 2016-03-05 18:17:58 0.00002 2016-03-05 18:47:14 0.00002 2016-03-05 18:50:38 0.00002 2016-03-05 19:15:08 0.00002 2016-03-05 19:15:20 0.00002 2016-03-05 19:15:42 0.00002 2016-03-05 19:16:36 0.00002 2016-03-05 19:17:17 0.00002 ... 2016-04-07 14:58:09 0.00004 2016-04-07 14:58:10 0.00004 2016-04-07 14:58:12 0.00004 2016-04-07 14:58:13 0.00002 2016-04-07 14:58:14 0.00002 2016-04-07 14:58:17 0.00006 2016-04-07 14:58:18 0.00010 2016-04-07 14:58:20 0.00002 2016-04-07 14:58:21 0.00006 2016-04-07 14:58:22 0.00002 2016-04-07 14:58:24 0.00004 2016-04-07 14:58:25 0.00002 2016-04-07 14:58:26 0.00004 2016-04-07 14:58:27 0.00004 2016-04-07 14:58:28 0.00004 2016-04-07 14:58:29 0.00006 2016-04-07 14:58:31 0.00004 2016-04-07 14:58:33 0.00004 2016-04-07 14:58:34 0.00004 2016-04-07 14:58:36 0.00006 2016-04-07 14:58:37 0.00002 2016-04-07 14:58:38 0.00002 2016-04-07 14:58:40 0.00002 2016-04-07 14:58:41 0.00002 2016-04-07 14:58:42 0.00004 2016-04-07 14:58:44 0.00006 2016-04-07 14:58:45 0.00002 2016-04-07 14:58:46 0.00002 2016-04-07 14:58:48 0.00006 2016-04-07 14:58:50 0.00008 Name: last_seen, Length: 39481, dtype: float64
In the ["last_seen"] column majority of values take place in April 2016 while just a few in March 2016. All values are common in each date.
autos["registration_year"].describe()
count 50000.000000 mean 2005.073280 std 105.712813 min 1000.000000 25% 1999.000000 50% 2003.000000 75% 2008.000000 max 9999.000000 Name: registration_year, dtype: float64
The minimum and maximum values in the autos dataset is definitely not as expected. With the year 1000 defintely not a valid date of registration. The same conclusion can be drawn on the maximum value. We'll take our values between 1885, which is the year the first automobile was registered, and 2016 which is the year of the dataset has been created.
autos_new_reg = autos[autos["registration_year"].between(1885, 2016)]
autos_new_reg["registration_year"].value_counts(normalize=True).sort_index()
1910 0.000187 1927 0.000021 1929 0.000021 1931 0.000021 1934 0.000042 1937 0.000083 1938 0.000021 1939 0.000021 1941 0.000042 1943 0.000021 1948 0.000021 1950 0.000062 1951 0.000042 1952 0.000021 1953 0.000021 1954 0.000042 1955 0.000042 1956 0.000104 1957 0.000042 1958 0.000083 1959 0.000146 1960 0.000708 1961 0.000125 1962 0.000083 1963 0.000187 1964 0.000250 1965 0.000354 1966 0.000458 1967 0.000562 1968 0.000541 ... 1987 0.001562 1988 0.002957 1989 0.003769 1990 0.008224 1991 0.007412 1992 0.008141 1993 0.009265 1994 0.013742 1995 0.027338 1996 0.030066 1997 0.042225 1998 0.051074 1999 0.062464 2000 0.069834 2001 0.056280 2002 0.052740 2003 0.056779 2004 0.056988 2005 0.062776 2006 0.056384 2007 0.047972 2008 0.046452 2009 0.043683 2010 0.033251 2011 0.034022 2012 0.027546 2013 0.016782 2014 0.013867 2015 0.008308 2016 0.027401 Name: registration_year, Length: 78, dtype: float64
Majority of registration's are in the late 90's and 2000's. Prior to 1987 very few cars are registered. Variation of years is from 1910 to 2016.
top_20_brands = autos["brand"].value_counts().sort_values(ascending=False).head(20)
print(top_20_brands)
volkswagen 10687 opel 5461 bmw 5429 mercedes_benz 4734 audi 4283 ford 3479 renault 2404 peugeot 1456 fiat 1308 seat 941 skoda 786 mazda 757 nissan 754 smart 701 citroen 701 toyota 617 sonstige_autos 546 hyundai 488 volvo 457 mini 424 Name: brand, dtype: int64
In the above code, we've sorted the dataset and extracted the top twenty registered car brands. We will aggregate on the top twenty brands of registered cars.
mean_price_by_brand = {}
brands = autos["brand"].unique()
brands_sorted = sorted(brands)
brands_top_6 = brands_sorted[:5]
for row in brands_top_6:
selected_rows = autos[autos["brand"] == row]
mean = selected_rows["price"].mean()
mean_price_by_brand[row] = round(mean,2)
sorted_brand_list = sorted(mean_price_by_brand.items(),
key=lambda x:x[1], reverse=True)
for i in sorted_brand_list:
print(i[0], "$" + str(i[1]))
audi $8965.56 bmw $8252.92 chevrolet $6432.93 alfa_romeo $3943.56 chrysler $3286.06
In the above code we've extracted the average price of every listing by brand. The variation is from daewoo's lowest average of \$1038.35 to the highest average of porsche $44537.98.
mean_mileage_by_brand = {}
for row in brands_top_6:
selected_rows = autos[autos["brand"] == row]
mean = selected_rows["odometer_km"].mean()
mean_mileage_by_brand[row] = round(mean,2)
sorted_brand_list = sorted(mean_mileage_by_brand.items(),
key=lambda x:x[1], reverse=True)
for i in sorted_brand_list:
print(i[0], str(i[1]) + "km")
chrysler 133149.17km bmw 132521.64km alfa_romeo 131109.42km audi 129643.94km chevrolet 99522.97km
bmp_series = pd.Series(mean_price_by_brand)
bmm_series = pd.Series(mean_mileage_by_brand)
df = pd.DataFrame(bmp_series, columns=['mean_price_$'])
df["mean_mileage_km"] = bmm_series
print(df)
mean_price_$ mean_mileage_km alfa_romeo 3943.56 131109.42 audi 8965.56 129643.94 bmw 8252.92 132521.64 chevrolet 6432.93 99522.97 chrysler 3286.06 133149.17
In the above analysis we can see that bmw and audi have the highest average price. On the other hand chrysler have the highest average mileage.