The dataset was originally scraped and uploaded to Kaggle. With a few modifications from the original dataset. The data dictionary provided with data is as follows: dateCrawled - When this ad was first crawled. All field-values are taken from this date. name - Name of the car. seller - Whether the seller is private or a dealer. offerType - The type of listing price - The price on the ad to sell the car. abtest - Whether the listing is included in an A/B test. vehicleType - The vehicle Type. yearOfRegistration - The year in which the car was first registered. gearbox - The transmission type. powerPS - The power of the car in PS. model - The car model name. kilometer - How many kilometers the car has driven. monthOfRegistration - The month in which the car was first registered. fuelType - What type of fuel the car uses. brand - The brand of the car. notRepairedDamage - If the car has a damage which is not yet repaired. dateCreated - The date on which the eBay listing was created. nrOfPictures - The number of pictures in the ad. postalCode - The postal code for the location of the vehicle. lastSeenOnline - When the crawler saw this ad last online.
import pandas as pd
import numpy as np
autos = pd.read_csv("autos.csv", encoding = "Latin-1")
autos
dateCrawled | name | seller | offerType | price | abtest | vehicleType | yearOfRegistration | gearbox | powerPS | model | odometer | monthOfRegistration | fuelType | brand | notRepairedDamage | dateCreated | nrOfPictures | postalCode | lastSeen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-26 17:47:46 | Peugeot_807_160_NAVTECH_ON_BOARD | privat | Angebot | $5,000 | control | bus | 2004 | manuell | 158 | andere | 150,000km | 3 | lpg | peugeot | nein | 2016-03-26 00:00:00 | 0 | 79588 | 2016-04-06 06:45:54 |
1 | 2016-04-04 13:38:56 | BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik | privat | Angebot | $8,500 | control | limousine | 1997 | automatik | 286 | 7er | 150,000km | 6 | benzin | bmw | nein | 2016-04-04 00:00:00 | 0 | 71034 | 2016-04-06 14:45:08 |
2 | 2016-03-26 18:57:24 | Volkswagen_Golf_1.6_United | privat | Angebot | $8,990 | test | limousine | 2009 | manuell | 102 | golf | 70,000km | 7 | benzin | volkswagen | nein | 2016-03-26 00:00:00 | 0 | 35394 | 2016-04-06 20:15:37 |
3 | 2016-03-12 16:58:10 | Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... | privat | Angebot | $4,350 | control | kleinwagen | 2007 | automatik | 71 | fortwo | 70,000km | 6 | benzin | smart | nein | 2016-03-12 00:00:00 | 0 | 33729 | 2016-03-15 03:16:28 |
4 | 2016-04-01 14:38:50 | Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... | privat | Angebot | $1,350 | test | kombi | 2003 | manuell | 0 | focus | 150,000km | 7 | benzin | ford | nein | 2016-04-01 00:00:00 | 0 | 39218 | 2016-04-01 14:38:50 |
5 | 2016-03-21 13:47:45 | Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto... | privat | Angebot | $7,900 | test | bus | 2006 | automatik | 150 | voyager | 150,000km | 4 | diesel | chrysler | NaN | 2016-03-21 00:00:00 | 0 | 22962 | 2016-04-06 09:45:21 |
6 | 2016-03-20 17:55:21 | VW_Golf_III_GT_Special_Electronic_Green_Metall... | privat | Angebot | $300 | test | limousine | 1995 | manuell | 90 | golf | 150,000km | 8 | benzin | volkswagen | NaN | 2016-03-20 00:00:00 | 0 | 31535 | 2016-03-23 02:48:59 |
7 | 2016-03-16 18:55:19 | Golf_IV_1.9_TDI_90PS | privat | Angebot | $1,990 | control | limousine | 1998 | manuell | 90 | golf | 150,000km | 12 | diesel | volkswagen | nein | 2016-03-16 00:00:00 | 0 | 53474 | 2016-04-07 03:17:32 |
8 | 2016-03-22 16:51:34 | Seat_Arosa | privat | Angebot | $250 | test | NaN | 2000 | manuell | 0 | arosa | 150,000km | 10 | NaN | seat | nein | 2016-03-22 00:00:00 | 0 | 7426 | 2016-03-26 18:18:10 |
9 | 2016-03-16 13:47:02 | Renault_Megane_Scenic_1.6e_RT_Klimaanlage | privat | Angebot | $590 | control | bus | 1997 | manuell | 90 | megane | 150,000km | 7 | benzin | renault | nein | 2016-03-16 00:00:00 | 0 | 15749 | 2016-04-06 10:46:35 |
10 | 2016-03-15 01:41:36 | VW_Golf_Tuning_in_siber/grau | privat | Angebot | $999 | test | NaN | 2017 | manuell | 90 | NaN | 150,000km | 4 | benzin | volkswagen | nein | 2016-03-14 00:00:00 | 0 | 86157 | 2016-04-07 03:16:21 |
11 | 2016-03-16 18:45:34 | Mercedes_A140_Motorschaden | privat | Angebot | $350 | control | NaN | 2000 | NaN | 0 | NaN | 150,000km | 0 | benzin | mercedes_benz | NaN | 2016-03-16 00:00:00 | 0 | 17498 | 2016-03-16 18:45:34 |
12 | 2016-03-31 19:48:22 | Smart_smart_fortwo_coupe_softouch_pure_MHD_Pan... | privat | Angebot | $5,299 | control | kleinwagen | 2010 | automatik | 71 | fortwo | 50,000km | 9 | benzin | smart | nein | 2016-03-31 00:00:00 | 0 | 34590 | 2016-04-06 14:17:52 |
13 | 2016-03-23 10:48:32 | Audi_A3_1.6_tuning | privat | Angebot | $1,350 | control | limousine | 1999 | manuell | 101 | a3 | 150,000km | 11 | benzin | audi | nein | 2016-03-23 00:00:00 | 0 | 12043 | 2016-04-01 14:17:13 |
14 | 2016-03-23 11:50:46 | Renault_Clio_3__Dynamique_1.2__16_V;_viele_Ver... | privat | Angebot | $3,999 | test | kleinwagen | 2007 | manuell | 75 | clio | 150,000km | 9 | benzin | renault | NaN | 2016-03-23 00:00:00 | 0 | 81737 | 2016-04-01 15:46:47 |
15 | 2016-04-01 12:06:20 | Corvette_C3_Coupe_T_Top_Crossfire_Injection | privat | Angebot | $18,900 | test | coupe | 1982 | automatik | 203 | NaN | 80,000km | 6 | benzin | sonstige_autos | nein | 2016-04-01 00:00:00 | 0 | 61276 | 2016-04-02 21:10:48 |
16 | 2016-03-16 14:59:02 | Opel_Vectra_B_Kombi | privat | Angebot | $350 | test | kombi | 1999 | manuell | 101 | vectra | 150,000km | 5 | benzin | opel | nein | 2016-03-16 00:00:00 | 0 | 57299 | 2016-03-18 05:29:37 |
17 | 2016-03-29 11:46:22 | Volkswagen_Scirocco_2_G60 | privat | Angebot | $5,500 | test | coupe | 1990 | manuell | 205 | scirocco | 150,000km | 6 | benzin | volkswagen | nein | 2016-03-29 00:00:00 | 0 | 74821 | 2016-04-05 20:46:26 |
18 | 2016-03-26 19:57:44 | Verkaufen_mein_bmw_e36_320_i_touring | privat | Angebot | $300 | control | bus | 1995 | manuell | 150 | 3er | 150,000km | 0 | benzin | bmw | NaN | 2016-03-26 00:00:00 | 0 | 54329 | 2016-04-02 12:16:41 |
19 | 2016-03-17 13:36:21 | mazda_tribute_2.0_mit_gas_und_tuev_neu_2018 | privat | Angebot | $4,150 | control | suv | 2004 | manuell | 124 | andere | 150,000km | 2 | lpg | mazda | nein | 2016-03-17 00:00:00 | 0 | 40878 | 2016-03-17 14:45:58 |
20 | 2016-03-05 19:57:31 | Audi_A4_Avant_1.9_TDI_*6_Gang*AHK*Klimatronik*... | privat | Angebot | $3,500 | test | kombi | 2003 | manuell | 131 | a4 | 150,000km | 5 | diesel | audi | NaN | 2016-03-05 00:00:00 | 0 | 53913 | 2016-03-07 05:46:46 |
21 | 2016-03-06 19:07:10 | Porsche_911_Carrera_4S_Cabrio | privat | Angebot | $41,500 | test | cabrio | 2004 | manuell | 320 | 911 | 150,000km | 4 | benzin | porsche | nein | 2016-03-06 00:00:00 | 0 | 65428 | 2016-04-05 23:46:19 |
22 | 2016-03-28 20:50:54 | MINI_Cooper_S_Cabrio | privat | Angebot | $25,450 | control | cabrio | 2015 | manuell | 184 | cooper | 10,000km | 1 | benzin | mini | nein | 2016-03-28 00:00:00 | 0 | 44789 | 2016-04-01 06:45:30 |
23 | 2016-03-10 19:55:34 | Peugeot_Boxer_2_2_HDi_120_Ps_9_Sitzer_inkl_Klima | privat | Angebot | $7,999 | control | bus | 2010 | manuell | 120 | NaN | 150,000km | 2 | diesel | peugeot | nein | 2016-03-10 00:00:00 | 0 | 30900 | 2016-03-17 08:45:17 |
24 | 2016-04-03 11:57:02 | BMW_535i_xDrive_Sport_Aut. | privat | Angebot | $48,500 | control | limousine | 2014 | automatik | 306 | 5er | 30,000km | 12 | benzin | bmw | nein | 2016-04-03 00:00:00 | 0 | 22547 | 2016-04-07 13:16:50 |
25 | 2016-03-21 21:56:18 | Ford_escort_kombi_an_bastler_mit_ghia_ausstattung | privat | Angebot | $90 | control | kombi | 1996 | manuell | 116 | NaN | 150,000km | 4 | benzin | ford | ja | 2016-03-21 00:00:00 | 0 | 27574 | 2016-04-01 05:16:49 |
26 | 2016-04-03 22:46:28 | Volkswagen_Polo_Fox | privat | Angebot | $777 | control | kleinwagen | 1992 | manuell | 54 | polo | 125,000km | 2 | benzin | volkswagen | nein | 2016-04-03 00:00:00 | 0 | 38110 | 2016-04-05 23:46:48 |
27 | 2016-03-27 18:45:01 | Hat_einer_Ahnung_mit_Ford_Galaxy_HILFE | privat | Angebot | $0 | control | NaN | 2005 | NaN | 0 | NaN | 150,000km | 0 | NaN | ford | NaN | 2016-03-27 00:00:00 | 0 | 66701 | 2016-03-27 18:45:01 |
28 | 2016-03-19 21:56:19 | MINI_Cooper_D | privat | Angebot | $5,250 | control | kleinwagen | 2007 | manuell | 110 | cooper | 150,000km | 7 | diesel | mini | ja | 2016-03-19 00:00:00 | 0 | 15745 | 2016-04-07 14:58:48 |
29 | 2016-04-02 12:45:44 | Mercedes_Benz_E_320_T_CDI_Avantgarde_DPF7_Sitz... | privat | Angebot | $4,999 | test | kombi | 2004 | automatik | 204 | e_klasse | 150,000km | 10 | diesel | mercedes_benz | nein | 2016-04-02 00:00:00 | 0 | 47638 | 2016-04-02 12:45:44 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
49970 | 2016-03-21 22:47:37 | c4_Grand_Picasso_mit_Automatik_Leder_Navi_Temp... | privat | Angebot | $15,800 | control | bus | 2010 | automatik | 136 | c4 | 60,000km | 4 | diesel | citroen | nein | 2016-03-21 00:00:00 | 0 | 14947 | 2016-04-07 04:17:34 |
49971 | 2016-03-29 14:54:12 | W.Lupo_1.0 | privat | Angebot | $950 | test | kleinwagen | 2001 | manuell | 50 | lupo | 150,000km | 4 | benzin | volkswagen | nein | 2016-03-29 00:00:00 | 0 | 65197 | 2016-03-29 20:41:51 |
49972 | 2016-03-26 22:25:23 | Mercedes_Benz_Vito_115_CDI_Extralang_Aut. | privat | Angebot | $3,300 | control | bus | 2004 | automatik | 150 | vito | 150,000km | 10 | diesel | mercedes_benz | ja | 2016-03-26 00:00:00 | 0 | 65326 | 2016-03-28 11:28:18 |
49973 | 2016-03-27 05:32:39 | Mercedes_Benz_SLK_200_Kompressor | privat | Angebot | $6,000 | control | cabrio | 2004 | manuell | 163 | slk | 150,000km | 11 | benzin | mercedes_benz | nein | 2016-03-27 00:00:00 | 0 | 53567 | 2016-03-27 08:25:24 |
49974 | 2016-03-20 10:52:31 | Golf_1_Cabrio_Tuev_Neu_viele_Extras_alles_eing... | privat | Angebot | $0 | control | cabrio | 1983 | manuell | 70 | golf | 150,000km | 2 | benzin | volkswagen | nein | 2016-03-20 00:00:00 | 0 | 8209 | 2016-03-27 19:48:16 |
49975 | 2016-03-27 20:51:39 | Honda_Jazz_1.3_DSi_i_VTEC_IMA_CVT_Comfort | privat | Angebot | $9,700 | control | kleinwagen | 2012 | automatik | 88 | jazz | 100,000km | 11 | hybrid | honda | nein | 2016-03-27 00:00:00 | 0 | 84385 | 2016-04-05 19:45:34 |
49976 | 2016-03-19 18:56:05 | Audi_80_Avant_2.6_E__Vollausstattung!!_Einziga... | privat | Angebot | $5,900 | test | kombi | 1992 | automatik | 150 | 80 | 150,000km | 12 | benzin | audi | nein | 2016-03-19 00:00:00 | 0 | 36100 | 2016-04-07 06:16:44 |
49977 | 2016-03-31 18:37:18 | Mercedes_Benz_C200_Cdi_W203 | privat | Angebot | $5,500 | control | limousine | 2003 | manuell | 116 | c_klasse | 150,000km | 2 | diesel | mercedes_benz | nein | 2016-03-31 00:00:00 | 0 | 33739 | 2016-04-06 12:16:11 |
49978 | 2016-04-04 10:37:14 | Mercedes_Benz_E_200_Classic | privat | Angebot | $900 | control | limousine | 1996 | automatik | 136 | e_klasse | 150,000km | 9 | benzin | mercedes_benz | ja | 2016-04-04 00:00:00 | 0 | 24405 | 2016-04-06 12:44:20 |
49979 | 2016-03-20 18:38:40 | Volkswagen_Polo_1.6_TDI_Style | privat | Angebot | $11,000 | test | kleinwagen | 2011 | manuell | 90 | polo | 70,000km | 11 | diesel | volkswagen | nein | 2016-03-20 00:00:00 | 0 | 48455 | 2016-04-07 01:45:12 |
49980 | 2016-03-12 10:55:54 | Ford_Escort_Turnier_16V | privat | Angebot | $400 | control | kombi | 1995 | manuell | 105 | escort | 125,000km | 3 | benzin | ford | NaN | 2016-03-12 00:00:00 | 0 | 56218 | 2016-04-06 17:16:49 |
49981 | 2016-03-15 09:38:21 | Opel_Astra_Kombi_mit_Anhaengerkupplung | privat | Angebot | $2,000 | control | kombi | 1998 | manuell | 115 | astra | 150,000km | 12 | benzin | opel | nein | 2016-03-15 00:00:00 | 0 | 86859 | 2016-04-05 17:21:46 |
49982 | 2016-03-29 18:51:08 | Skoda_Fabia_4_Tuerer_Bj:2004__85.000Tkm | privat | Angebot | $1,950 | control | kleinwagen | 2004 | manuell | 0 | fabia | 90,000km | 7 | benzin | skoda | NaN | 2016-03-29 00:00:00 | 0 | 45884 | 2016-03-29 18:51:08 |
49983 | 2016-03-06 12:43:04 | Ford_focus_99 | privat | Angebot | $600 | test | kleinwagen | 1999 | manuell | 101 | focus | 150,000km | 4 | benzin | ford | NaN | 2016-03-06 00:00:00 | 0 | 52477 | 2016-03-09 06:16:08 |
49984 | 2016-03-31 22:48:48 | Student_sucht_ein__Anfaengerauto___ab_2000_BJ_... | privat | Angebot | $0 | test | NaN | 2000 | NaN | 0 | NaN | 150,000km | 0 | NaN | sonstige_autos | NaN | 2016-03-31 00:00:00 | 0 | 12103 | 2016-04-02 19:44:53 |
49985 | 2016-04-02 16:38:23 | Verkaufe_meinen_vw_vento! | privat | Angebot | $1,000 | control | NaN | 1995 | automatik | 0 | NaN | 150,000km | 0 | benzin | volkswagen | NaN | 2016-04-02 00:00:00 | 0 | 30900 | 2016-04-06 15:17:52 |
49986 | 2016-04-04 20:46:02 | Chrysler_300C_3.0_CRD_DPF_Automatik_Voll_Ausst... | privat | Angebot | $15,900 | control | limousine | 2010 | automatik | 218 | 300c | 125,000km | 11 | diesel | chrysler | nein | 2016-04-04 00:00:00 | 0 | 73527 | 2016-04-06 23:16:00 |
49987 | 2016-03-22 20:47:27 | Audi_A3_Limousine_2.0_TDI_DPF_Ambition__NAVI__... | privat | Angebot | $21,990 | control | limousine | 2013 | manuell | 150 | a3 | 50,000km | 11 | diesel | audi | nein | 2016-03-22 00:00:00 | 0 | 94362 | 2016-03-26 22:46:06 |
49988 | 2016-03-28 19:49:51 | BMW_330_Ci | privat | Angebot | $9,550 | control | coupe | 2001 | manuell | 231 | 3er | 150,000km | 10 | benzin | bmw | nein | 2016-03-28 00:00:00 | 0 | 83646 | 2016-04-07 02:17:40 |
49989 | 2016-03-11 19:50:37 | VW_Polo_zum_Ausschlachten_oder_Wiederaufbau | privat | Angebot | $150 | test | kleinwagen | 1997 | manuell | 0 | polo | 150,000km | 5 | benzin | volkswagen | ja | 2016-03-11 00:00:00 | 0 | 21244 | 2016-03-12 10:17:55 |
49990 | 2016-03-21 19:54:19 | Mercedes_Benz_A_200__BlueEFFICIENCY__Urban | privat | Angebot | $17,500 | test | limousine | 2012 | manuell | 156 | a_klasse | 30,000km | 12 | benzin | mercedes_benz | nein | 2016-03-21 00:00:00 | 0 | 58239 | 2016-04-06 22:46:57 |
49991 | 2016-03-06 15:25:19 | Kleinwagen | privat | Angebot | $500 | control | NaN | 2016 | manuell | 0 | twingo | 150,000km | 0 | benzin | renault | NaN | 2016-03-06 00:00:00 | 0 | 61350 | 2016-03-06 18:24:19 |
49992 | 2016-03-10 19:37:38 | Fiat_Grande_Punto_1.4_T_Jet_16V_Sport | privat | Angebot | $4,800 | control | kleinwagen | 2009 | manuell | 120 | andere | 125,000km | 9 | lpg | fiat | nein | 2016-03-10 00:00:00 | 0 | 68642 | 2016-03-13 01:44:51 |
49993 | 2016-03-15 18:47:35 | Audi_A3__1_8l__Silber;_schoenes_Fahrzeug | privat | Angebot | $1,650 | control | kleinwagen | 1997 | manuell | 0 | NaN | 150,000km | 7 | benzin | audi | NaN | 2016-03-15 00:00:00 | 0 | 65203 | 2016-04-06 19:46:53 |
49994 | 2016-03-22 17:36:42 | Audi_A6__S6__Avant_4.2_quattro_eventuell_Tausc... | privat | Angebot | $5,000 | control | kombi | 2001 | automatik | 299 | a6 | 150,000km | 1 | benzin | audi | nein | 2016-03-22 00:00:00 | 0 | 46537 | 2016-04-06 08:16:39 |
49995 | 2016-03-27 14:38:19 | Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon | privat | Angebot | $24,900 | control | limousine | 2011 | automatik | 239 | q5 | 100,000km | 1 | diesel | audi | nein | 2016-03-27 00:00:00 | 0 | 82131 | 2016-04-01 13:47:40 |
49996 | 2016-03-28 10:50:25 | Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+... | privat | Angebot | $1,980 | control | cabrio | 1996 | manuell | 75 | astra | 150,000km | 5 | benzin | opel | nein | 2016-03-28 00:00:00 | 0 | 44807 | 2016-04-02 14:18:02 |
49997 | 2016-04-02 14:44:48 | Fiat_500_C_1.2_Dualogic_Lounge | privat | Angebot | $13,200 | test | cabrio | 2014 | automatik | 69 | 500 | 5,000km | 11 | benzin | fiat | nein | 2016-04-02 00:00:00 | 0 | 73430 | 2016-04-04 11:47:27 |
49998 | 2016-03-08 19:25:42 | Audi_A3_2.0_TDI_Sportback_Ambition | privat | Angebot | $22,900 | control | kombi | 2013 | manuell | 150 | a3 | 40,000km | 11 | diesel | audi | nein | 2016-03-08 00:00:00 | 0 | 35683 | 2016-04-05 16:45:07 |
49999 | 2016-03-14 00:42:12 | Opel_Vectra_1.6_16V | privat | Angebot | $1,250 | control | limousine | 1996 | manuell | 101 | vectra | 150,000km | 1 | benzin | opel | nein | 2016-03-13 00:00:00 | 0 | 45897 | 2016-04-06 21:18:48 |
50000 rows × 20 columns
autos.info()
autos.head()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 50000 entries, 0 to 49999 Data columns (total 20 columns): dateCrawled 50000 non-null object name 50000 non-null object seller 50000 non-null object offerType 50000 non-null object price 50000 non-null object abtest 50000 non-null object vehicleType 44905 non-null object yearOfRegistration 50000 non-null int64 gearbox 47320 non-null object powerPS 50000 non-null int64 model 47242 non-null object odometer 50000 non-null object monthOfRegistration 50000 non-null int64 fuelType 45518 non-null object brand 50000 non-null object notRepairedDamage 40171 non-null object dateCreated 50000 non-null object nrOfPictures 50000 non-null int64 postalCode 50000 non-null int64 lastSeen 50000 non-null object dtypes: int64(5), object(15) memory usage: 7.6+ MB
dateCrawled | name | seller | offerType | price | abtest | vehicleType | yearOfRegistration | gearbox | powerPS | model | odometer | monthOfRegistration | fuelType | brand | notRepairedDamage | dateCreated | nrOfPictures | postalCode | lastSeen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-26 17:47:46 | Peugeot_807_160_NAVTECH_ON_BOARD | privat | Angebot | $5,000 | control | bus | 2004 | manuell | 158 | andere | 150,000km | 3 | lpg | peugeot | nein | 2016-03-26 00:00:00 | 0 | 79588 | 2016-04-06 06:45:54 |
1 | 2016-04-04 13:38:56 | BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik | privat | Angebot | $8,500 | control | limousine | 1997 | automatik | 286 | 7er | 150,000km | 6 | benzin | bmw | nein | 2016-04-04 00:00:00 | 0 | 71034 | 2016-04-06 14:45:08 |
2 | 2016-03-26 18:57:24 | Volkswagen_Golf_1.6_United | privat | Angebot | $8,990 | test | limousine | 2009 | manuell | 102 | golf | 70,000km | 7 | benzin | volkswagen | nein | 2016-03-26 00:00:00 | 0 | 35394 | 2016-04-06 20:15:37 |
3 | 2016-03-12 16:58:10 | Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... | privat | Angebot | $4,350 | control | kleinwagen | 2007 | automatik | 71 | fortwo | 70,000km | 6 | benzin | smart | nein | 2016-03-12 00:00:00 | 0 | 33729 | 2016-03-15 03:16:28 |
4 | 2016-04-01 14:38:50 | Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... | privat | Angebot | $1,350 | test | kombi | 2003 | manuell | 0 | focus | 150,000km | 7 | benzin | ford | nein | 2016-04-01 00:00:00 | 0 | 39218 | 2016-04-01 14:38:50 |
autos.columns
Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest', 'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model', 'odometer', 'monthOfRegistration', 'fuelType', 'brand', 'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode', 'lastSeen'], dtype='object')
def became_snake(a_column_name):
pos = 0
for letter in a_column_name:
if letter.isupper() == True:
pos = a_column_name.index(letter)
string1 = a_column_name[:pos]
string2 = a_column_name[pos:]
a_column_name = string1+"_"+string2
return (a_column_name.lower())
autos_columns_fixed = []
for each in autos.columns:
autos_columns_fixed.append(became_snake(each))
autos.columns = autos_columns_fixed
autos.head()
date_crawled | name | seller | offer_type | price | abtest | vehicle_type | year_of_registration | gearbox | power_p_s | model | odometer | month_of_registration | fuel_type | brand | not_repaired_damage | date_created | nr_of_pictures | postal_code | last_seen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-26 17:47:46 | Peugeot_807_160_NAVTECH_ON_BOARD | privat | Angebot | $5,000 | control | bus | 2004 | manuell | 158 | andere | 150,000km | 3 | lpg | peugeot | nein | 2016-03-26 00:00:00 | 0 | 79588 | 2016-04-06 06:45:54 |
1 | 2016-04-04 13:38:56 | BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik | privat | Angebot | $8,500 | control | limousine | 1997 | automatik | 286 | 7er | 150,000km | 6 | benzin | bmw | nein | 2016-04-04 00:00:00 | 0 | 71034 | 2016-04-06 14:45:08 |
2 | 2016-03-26 18:57:24 | Volkswagen_Golf_1.6_United | privat | Angebot | $8,990 | test | limousine | 2009 | manuell | 102 | golf | 70,000km | 7 | benzin | volkswagen | nein | 2016-03-26 00:00:00 | 0 | 35394 | 2016-04-06 20:15:37 |
3 | 2016-03-12 16:58:10 | Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... | privat | Angebot | $4,350 | control | kleinwagen | 2007 | automatik | 71 | fortwo | 70,000km | 6 | benzin | smart | nein | 2016-03-12 00:00:00 | 0 | 33729 | 2016-03-15 03:16:28 |
4 | 2016-04-01 14:38:50 | Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... | privat | Angebot | $1,350 | test | kombi | 2003 | manuell | 0 | focus | 150,000km | 7 | benzin | ford | nein | 2016-04-01 00:00:00 | 0 | 39218 | 2016-04-01 14:38:50 |
autos.describe(include='all')
date_crawled | name | seller | offer_type | price | abtest | vehicle_type | year_of_registration | gearbox | power_p_s | model | odometer | month_of_registration | fuel_type | brand | not_repaired_damage | date_created | nr_of_pictures | postal_code | last_seen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 50000 | 50000 | 50000 | 50000 | 50000 | 50000 | 44905 | 50000.000000 | 47320 | 50000.000000 | 47242 | 50000 | 50000.000000 | 45518 | 50000 | 40171 | 50000 | 50000.0 | 50000.000000 | 50000 |
unique | 48213 | 38754 | 2 | 2 | 2357 | 2 | 8 | NaN | 2 | NaN | 245 | 13 | NaN | 7 | 40 | 2 | 76 | NaN | NaN | 39481 |
top | 2016-04-02 15:49:30 | Ford_Fiesta | privat | Angebot | $0 | test | limousine | NaN | manuell | NaN | golf | 150,000km | NaN | benzin | volkswagen | nein | 2016-04-03 00:00:00 | NaN | NaN | 2016-04-07 06:17:27 |
freq | 3 | 78 | 49999 | 49999 | 1421 | 25756 | 12859 | NaN | 36993 | NaN | 4024 | 32424 | NaN | 30107 | 10687 | 35232 | 1946 | NaN | NaN | 8 |
mean | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2005.073280 | NaN | 116.355920 | NaN | NaN | 5.723360 | NaN | NaN | NaN | NaN | 0.0 | 50813.627300 | NaN |
std | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 105.712813 | NaN | 209.216627 | NaN | NaN | 3.711984 | NaN | NaN | NaN | NaN | 0.0 | 25779.747957 | NaN |
min | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1000.000000 | NaN | 0.000000 | NaN | NaN | 0.000000 | NaN | NaN | NaN | NaN | 0.0 | 1067.000000 | NaN |
25% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1999.000000 | NaN | 70.000000 | NaN | NaN | 3.000000 | NaN | NaN | NaN | NaN | 0.0 | 30451.000000 | NaN |
50% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2003.000000 | NaN | 105.000000 | NaN | NaN | 6.000000 | NaN | NaN | NaN | NaN | 0.0 | 49577.000000 | NaN |
75% | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2008.000000 | NaN | 150.000000 | NaN | NaN | 9.000000 | NaN | NaN | NaN | NaN | 0.0 | 71540.000000 | NaN |
max | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 9999.000000 | NaN | 17700.000000 | NaN | NaN | 12.000000 | NaN | NaN | NaN | NaN | 0.0 | 99998.000000 | NaN |
autos.describe()
year_of_registration | power_p_s | month_of_registration | nr_of_pictures | postal_code | |
---|---|---|---|---|---|
count | 50000.000000 | 50000.000000 | 50000.000000 | 50000.0 | 50000.000000 |
mean | 2005.073280 | 116.355920 | 5.723360 | 0.0 | 50813.627300 |
std | 105.712813 | 209.216627 | 3.711984 | 0.0 | 25779.747957 |
min | 1000.000000 | 0.000000 | 0.000000 | 0.0 | 1067.000000 |
25% | 1999.000000 | 70.000000 | 3.000000 | 0.0 | 30451.000000 |
50% | 2003.000000 | 105.000000 | 6.000000 | 0.0 | 49577.000000 |
75% | 2008.000000 | 150.000000 | 9.000000 | 0.0 | 71540.000000 |
max | 9999.000000 | 17700.000000 | 12.000000 | 0.0 | 99998.000000 |
Power_p_s / month_of_registration
They have quite a lot of empty values
autos['power_p_s'].value_counts()
0 5500 75 3171 60 2195 150 2046 140 1884 101 1756 90 1746 116 1646 170 1492 105 1410 125 964 136 955 102 868 163 847 54 759 143 733 131 713 122 710 110 694 109 620 50 604 80 560 177 542 58 506 120 501 115 481 69 475 45 397 95 382 68 380 ... 999 1 455 1 442 1 1082 1 678 1 454 1 187 1 262 1 441 1 585 1 460 1 5867 1 9011 1 268 1 236 1 1367 1 24 1 12 1 1771 1 1003 1 587 1 696 1 952 1 1016 1 682 1 650 1 490 1 362 1 153 1 16312 1 Name: power_p_s, Length: 448, dtype: int64
autos['month_of_registration'].value_counts()
0 5075 3 5071 6 4368 5 4107 4 4102 7 3949 10 3651 12 3447 9 3389 11 3360 1 3282 8 3191 2 3008 Name: month_of_registration, dtype: int64
date_crawled/date_created They're quite similar, needs to be more investigate before dropped
autos['date_crawled'].head()
0 2016-03-26 17:47:46 1 2016-04-04 13:38:56 2 2016-03-26 18:57:24 3 2016-03-12 16:58:10 4 2016-04-01 14:38:50 Name: date_crawled, dtype: object
autos['offer_type'].head()
0 Angebot 1 Angebot 2 Angebot 3 Angebot 4 Angebot Name: offer_type, dtype: object
autos['seller'].head()
0 privat 1 privat 2 privat 3 privat 4 privat Name: seller, dtype: object
autos['offer_type'].head()
0 Angebot 1 Angebot 2 Angebot 3 Angebot 4 Angebot Name: offer_type, dtype: object
Values that must be integers
autos['price'].describe()
count 50000 unique 2357 top $0 freq 1421 Name: price, dtype: object
autos['odometer'].describe()
count 50000 unique 13 top 150,000km freq 32424 Name: odometer, dtype: object
Remove any non-numeric characters. Convert the column to a numeric dtype. Use DataFrame.rename() to rename the column to odometer_km.
autos["price"] = autos["price"].replace("$","").replace(",","").astype(int)
autos["price"].head()
0 5 1 8 2 8 3 4 4 1 Name: price, dtype: int64
#autos["odometer"] = (autos["odometer"].replace("km","").replace(",","").astype(float))
#autos["odometer"].head()
autos.rename(columns={'odometer': 'odometer_km'}, inplace=True)
autos['odometer_km'].head()
0 150000.0 1 150000.0 2 70000.0 3 70000.0 4 150000.0 Name: odometer_km, dtype: float64
#Price Column
print(autos['price'].max())
autos['price'].min()
999
0
autos['price'].unique().shape
(330,)
autos['price'].describe()
count 50000.000000 mean 120.260500 std 258.059503 min 0.000000 25% 2.000000 50% 5.000000 75% 18.000000 max 999.000000 Name: price, dtype: float64
autos['price'].value_counts().sort_index(ascending=False).head(20)
999 437 998 5 996 1 995 5 990 147 989 2 985 4 980 48 975 2 970 7 965 2 960 1 958 1 951 1 950 379 949 11 945 2 940 2 930 4 925 1 Name: price, dtype: int64
check_price = autos["price"].between(0,500)
check_price.describe()
count 50000 unique 2 top True freq 44307 Name: price, dtype: object
autos["price"].value_counts().sort_index(ascending=True).head(20)
0 1421 1 8692 2 5663 3 4165 4 3036 5 2368 6 2084 7 1807 8 1505 9 1306 10 918 11 787 12 811 13 665 14 605 15 523 16 478 17 404 18 318 19 322 Name: price, dtype: int64
#Odometer_km Column
print(autos['odometer_km'].max())
autos['odometer_km'].min()
150000.0
5000.0
autos['odometer_km'].unique().shape
(13,)
autos['odometer_km'].describe()
count 50000.000000 mean 125732.700000 std 40042.211706 min 5000.000000 25% 125000.000000 50% 150000.000000 75% 150000.000000 max 150000.000000 Name: odometer_km, dtype: float64
check_price = autos["odometer_km"].between(0,20000)
check_price.describe()
count 50000 unique 2 top False freq 47985 Name: odometer_km, dtype: object
autos['odometer_km'].value_counts().sort_index(ascending=False).head(20)
150000.0 32424 125000.0 5170 100000.0 2169 90000.0 1757 80000.0 1436 70000.0 1230 60000.0 1164 50000.0 1027 40000.0 819 30000.0 789 20000.0 784 10000.0 264 5000.0 967 Name: odometer_km, dtype: int64
autos['odometer_km'].value_counts().sort_index(ascending=True).head(20)
5000.0 967 10000.0 264 20000.0 784 30000.0 789 40000.0 819 50000.0 1027 60000.0 1164 70000.0 1230 80000.0 1436 90000.0 1757 100000.0 2169 125000.0 5170 150000.0 32424 Name: odometer_km, dtype: int64
#autos['price'] = autos['price'].drop([0], axis=1)
autos[['date_crawled','date_created','last_seen']][0:5]
date_crawled | date_created | last_seen | |
---|---|---|---|
0 | 2016-03-26 17:47:46 | 2016-03-26 00:00:00 | 2016-04-06 06:45:54 |
1 | 2016-04-04 13:38:56 | 2016-04-04 00:00:00 | 2016-04-06 14:45:08 |
2 | 2016-03-26 18:57:24 | 2016-03-26 00:00:00 | 2016-04-06 20:15:37 |
3 | 2016-03-12 16:58:10 | 2016-03-12 00:00:00 | 2016-03-15 03:16:28 |
4 | 2016-04-01 14:38:50 | 2016-04-01 00:00:00 | 2016-04-01 14:38:50 |
autos['date_created'].value_counts(normalize=True, dropna=False).sort_index(ascending=True)
2015-06-11 00:00:00 0.00002 2015-08-10 00:00:00 0.00002 2015-09-09 00:00:00 0.00002 2015-11-10 00:00:00 0.00002 2015-12-05 00:00:00 0.00002 2015-12-30 00:00:00 0.00002 2016-01-03 00:00:00 0.00002 2016-01-07 00:00:00 0.00002 2016-01-10 00:00:00 0.00004 2016-01-13 00:00:00 0.00002 2016-01-14 00:00:00 0.00002 2016-01-16 00:00:00 0.00002 2016-01-22 00:00:00 0.00002 2016-01-27 00:00:00 0.00006 2016-01-29 00:00:00 0.00002 2016-02-01 00:00:00 0.00002 2016-02-02 00:00:00 0.00004 2016-02-05 00:00:00 0.00004 2016-02-07 00:00:00 0.00002 2016-02-08 00:00:00 0.00002 2016-02-09 00:00:00 0.00004 2016-02-11 00:00:00 0.00002 2016-02-12 00:00:00 0.00006 2016-02-14 00:00:00 0.00004 2016-02-16 00:00:00 0.00002 2016-02-17 00:00:00 0.00002 2016-02-18 00:00:00 0.00004 2016-02-19 00:00:00 0.00006 2016-02-20 00:00:00 0.00004 2016-02-21 00:00:00 0.00006 ... 2016-03-09 00:00:00 0.03324 2016-03-10 00:00:00 0.03186 2016-03-11 00:00:00 0.03278 2016-03-12 00:00:00 0.03662 2016-03-13 00:00:00 0.01692 2016-03-14 00:00:00 0.03522 2016-03-15 00:00:00 0.03374 2016-03-16 00:00:00 0.03000 2016-03-17 00:00:00 0.03120 2016-03-18 00:00:00 0.01372 2016-03-19 00:00:00 0.03384 2016-03-20 00:00:00 0.03786 2016-03-21 00:00:00 0.03772 2016-03-22 00:00:00 0.03280 2016-03-23 00:00:00 0.03218 2016-03-24 00:00:00 0.02908 2016-03-25 00:00:00 0.03188 2016-03-26 00:00:00 0.03256 2016-03-27 00:00:00 0.03090 2016-03-28 00:00:00 0.03496 2016-03-29 00:00:00 0.03414 2016-03-30 00:00:00 0.03344 2016-03-31 00:00:00 0.03192 2016-04-01 00:00:00 0.03380 2016-04-02 00:00:00 0.03508 2016-04-03 00:00:00 0.03892 2016-04-04 00:00:00 0.03688 2016-04-05 00:00:00 0.01184 2016-04-06 00:00:00 0.00326 2016-04-07 00:00:00 0.00128 Name: date_created, Length: 76, dtype: float64
autos['date_created'].value_counts(normalize=True, dropna=False).sort_index(ascending=True)
2015-06-11 00:00:00 0.00002 2015-08-10 00:00:00 0.00002 2015-09-09 00:00:00 0.00002 2015-11-10 00:00:00 0.00002 2015-12-05 00:00:00 0.00002 2015-12-30 00:00:00 0.00002 2016-01-03 00:00:00 0.00002 2016-01-07 00:00:00 0.00002 2016-01-10 00:00:00 0.00004 2016-01-13 00:00:00 0.00002 2016-01-14 00:00:00 0.00002 2016-01-16 00:00:00 0.00002 2016-01-22 00:00:00 0.00002 2016-01-27 00:00:00 0.00006 2016-01-29 00:00:00 0.00002 2016-02-01 00:00:00 0.00002 2016-02-02 00:00:00 0.00004 2016-02-05 00:00:00 0.00004 2016-02-07 00:00:00 0.00002 2016-02-08 00:00:00 0.00002 2016-02-09 00:00:00 0.00004 2016-02-11 00:00:00 0.00002 2016-02-12 00:00:00 0.00006 2016-02-14 00:00:00 0.00004 2016-02-16 00:00:00 0.00002 2016-02-17 00:00:00 0.00002 2016-02-18 00:00:00 0.00004 2016-02-19 00:00:00 0.00006 2016-02-20 00:00:00 0.00004 2016-02-21 00:00:00 0.00006 ... 2016-03-09 00:00:00 0.03324 2016-03-10 00:00:00 0.03186 2016-03-11 00:00:00 0.03278 2016-03-12 00:00:00 0.03662 2016-03-13 00:00:00 0.01692 2016-03-14 00:00:00 0.03522 2016-03-15 00:00:00 0.03374 2016-03-16 00:00:00 0.03000 2016-03-17 00:00:00 0.03120 2016-03-18 00:00:00 0.01372 2016-03-19 00:00:00 0.03384 2016-03-20 00:00:00 0.03786 2016-03-21 00:00:00 0.03772 2016-03-22 00:00:00 0.03280 2016-03-23 00:00:00 0.03218 2016-03-24 00:00:00 0.02908 2016-03-25 00:00:00 0.03188 2016-03-26 00:00:00 0.03256 2016-03-27 00:00:00 0.03090 2016-03-28 00:00:00 0.03496 2016-03-29 00:00:00 0.03414 2016-03-30 00:00:00 0.03344 2016-03-31 00:00:00 0.03192 2016-04-01 00:00:00 0.03380 2016-04-02 00:00:00 0.03508 2016-04-03 00:00:00 0.03892 2016-04-04 00:00:00 0.03688 2016-04-05 00:00:00 0.01184 2016-04-06 00:00:00 0.00326 2016-04-07 00:00:00 0.00128 Name: date_created, Length: 76, dtype: float64
autos['last_seen'].value_counts(normalize=True, dropna=False).sort_index(ascending=True)
2016-03-05 14:45:46 0.00002 2016-03-05 14:46:02 0.00002 2016-03-05 14:49:34 0.00002 2016-03-05 15:16:11 0.00002 2016-03-05 15:16:47 0.00002 2016-03-05 15:28:10 0.00002 2016-03-05 15:41:30 0.00002 2016-03-05 15:45:43 0.00002 2016-03-05 15:47:38 0.00002 2016-03-05 15:47:44 0.00002 2016-03-05 16:45:57 0.00002 2016-03-05 16:47:28 0.00002 2016-03-05 17:15:45 0.00002 2016-03-05 17:16:12 0.00002 2016-03-05 17:16:14 0.00002 2016-03-05 17:16:23 0.00002 2016-03-05 17:17:02 0.00002 2016-03-05 17:39:19 0.00002 2016-03-05 17:40:14 0.00002 2016-03-05 17:44:50 0.00002 2016-03-05 17:44:54 0.00002 2016-03-05 17:46:01 0.00002 2016-03-05 18:17:58 0.00002 2016-03-05 18:47:14 0.00002 2016-03-05 18:50:38 0.00002 2016-03-05 19:15:08 0.00002 2016-03-05 19:15:20 0.00002 2016-03-05 19:15:42 0.00002 2016-03-05 19:16:36 0.00002 2016-03-05 19:17:17 0.00002 ... 2016-04-07 14:58:09 0.00004 2016-04-07 14:58:10 0.00004 2016-04-07 14:58:12 0.00004 2016-04-07 14:58:13 0.00002 2016-04-07 14:58:14 0.00002 2016-04-07 14:58:17 0.00006 2016-04-07 14:58:18 0.00010 2016-04-07 14:58:20 0.00002 2016-04-07 14:58:21 0.00006 2016-04-07 14:58:22 0.00002 2016-04-07 14:58:24 0.00004 2016-04-07 14:58:25 0.00002 2016-04-07 14:58:26 0.00004 2016-04-07 14:58:27 0.00004 2016-04-07 14:58:28 0.00004 2016-04-07 14:58:29 0.00006 2016-04-07 14:58:31 0.00004 2016-04-07 14:58:33 0.00004 2016-04-07 14:58:34 0.00004 2016-04-07 14:58:36 0.00006 2016-04-07 14:58:37 0.00002 2016-04-07 14:58:38 0.00002 2016-04-07 14:58:40 0.00002 2016-04-07 14:58:41 0.00002 2016-04-07 14:58:42 0.00004 2016-04-07 14:58:44 0.00006 2016-04-07 14:58:45 0.00002 2016-04-07 14:58:46 0.00002 2016-04-07 14:58:48 0.00006 2016-04-07 14:58:50 0.00008 Name: last_seen, Length: 39481, dtype: float64
autos['year_of_registration'].describe()
count 50000.000000 mean 2005.073280 std 105.712813 min 1000.000000 25% 1999.000000 50% 2003.000000 75% 2008.000000 max 9999.000000 Name: year_of_registration, dtype: float64
By sampling, among my friends the chosen range was between the years 2000 to 2020
autos = autos[autos['year_of_registration'].between(2000, 2020)]
autos.describe()
price | year_of_registration | power_p_s | odometer_km | month_of_registration | nr_of_pictures | postal_code | |
---|---|---|---|---|---|---|---|
count | 36099.000000 | 36099.000000 | 36099.000000 | 36099.000000 | 36099.000000 | 36099.0 | 36099.000000 |
mean | 67.809164 | 2006.685947 | 124.696584 | 121903.930857 | 5.858223 | 0.0 | 51364.343528 |
std | 200.620293 | 4.959006 | 239.628867 | 41499.336584 | 3.659947 | 0.0 | 25504.212094 |
min | 0.000000 | 2000.000000 | 0.000000 | 5000.000000 | 0.000000 | 0.0 | 1067.000000 |
25% | 2.000000 | 2003.000000 | 75.000000 | 100000.000000 | 3.000000 | 0.0 | 31139.000000 |
50% | 5.000000 | 2006.000000 | 115.000000 | 150000.000000 | 6.000000 | 0.0 | 50374.000000 |
75% | 12.000000 | 2010.000000 | 150.000000 | 150000.000000 | 9.000000 | 0.0 | 71706.000000 |
max | 999.000000 | 2019.000000 | 17700.000000 | 150000.000000 | 12.000000 | 0.0 | 99998.000000 |
autos["brand"].unique().shape
(40,)
brands = autos['brand'].value_counts(normalize=True).sort_values(ascending=False)[0:20].index
brands
Index(['volkswagen', 'bmw', 'opel', 'mercedes_benz', 'audi', 'ford', 'renault', 'peugeot', 'fiat', 'seat', 'skoda', 'smart', 'citroen', 'nissan', 'mazda', 'toyota', 'hyundai', 'mini', 'kia', 'sonstige_autos'], dtype='object')
price_by_brand = {}
for brand in brands:
price_mean = autos.loc[autos['brand'] == brand, 'price'].mean()
price_by_brand[brand] = price_mean
price_by_brand
{'audi': 25.862237122828407, 'bmw': 25.051572975760703, 'citroen': 92.29867986798679, 'fiat': 154.78690127077223, 'ford': 124.26366435719784, 'hyundai': 60.42265795206972, 'kia': 71.8259587020649, 'mazda': 79.68545454545455, 'mercedes_benz': 30.105216360403084, 'mini': 10.717391304347826, 'nissan': 100.78918918918919, 'opel': 106.43807713199348, 'peugeot': 94.55574324324324, 'renault': 151.2792946530148, 'seat': 76.77807848443842, 'skoda': 26.955040871934607, 'smart': 34.41284403669725, 'sonstige_autos': 59.965986394557824, 'toyota': 19.217741935483872, 'volkswagen': 49.97590189647797}
bmp_series = pd.Series(price_by_brand)
print(bmp_series)
audi 25.862237 bmw 25.051573 citroen 92.298680 fiat 154.786901 ford 124.263664 hyundai 60.422658 kia 71.825959 mazda 79.685455 mercedes_benz 30.105216 mini 10.717391 nissan 100.789189 opel 106.438077 peugeot 94.555743 renault 151.279295 seat 76.778078 skoda 26.955041 smart 34.412844 sonstige_autos 59.965986 toyota 19.217742 volkswagen 49.975902 dtype: float64
df = pd.DataFrame(bmp_series, columns=['mean_price'])
df
mean_price | |
---|---|
audi | 25.862237 |
bmw | 25.051573 |
citroen | 92.298680 |
fiat | 154.786901 |
ford | 124.263664 |
hyundai | 60.422658 |
kia | 71.825959 |
mazda | 79.685455 |
mercedes_benz | 30.105216 |
mini | 10.717391 |
nissan | 100.789189 |
opel | 106.438077 |
peugeot | 94.555743 |
renault | 151.279295 |
seat | 76.778078 |
skoda | 26.955041 |
smart | 34.412844 |
sonstige_autos | 59.965986 |
toyota | 19.217742 |
volkswagen | 49.975902 |
km_by_brand = {}
for brand in brands:
km_mean = autos.loc[autos['brand'] == brand, 'odometer_km'].mean()
km_by_brand[brand] = km_mean
km_by_brand
{'audi': 124975.61718988113, 'bmw': 128436.04951005674, 'citroen': 118613.86138613861, 'fiat': 116041.05571847508, 'ford': 123267.89838337182, 'hyundai': 105087.14596949892, 'kia': 111474.92625368731, 'mazda': 122272.72727272728, 'mercedes_benz': 127190.27860106698, 'mini': 89142.51207729468, 'nissan': 112648.64864864865, 'opel': 126195.00271591527, 'peugeot': 124987.33108108108, 'renault': 125116.60978384528, 'seat': 118795.66982408661, 'skoda': 110000.0, 'smart': 99434.25076452599, 'sonstige_autos': 94047.61904761905, 'toyota': 110776.20967741935, 'volkswagen': 123739.48381577071}
df_series = pd.Series(km_by_brand)
d = {'mean_price' : bmp_series, 'mean_mileage' : df_series}
dff = pd.DataFrame(data=d)
dff
mean_mileage | mean_price | |
---|---|---|
audi | 124975.617190 | 25.862237 |
bmw | 128436.049510 | 25.051573 |
citroen | 118613.861386 | 92.298680 |
fiat | 116041.055718 | 154.786901 |
ford | 123267.898383 | 124.263664 |
hyundai | 105087.145969 | 60.422658 |
kia | 111474.926254 | 71.825959 |
mazda | 122272.727273 | 79.685455 |
mercedes_benz | 127190.278601 | 30.105216 |
mini | 89142.512077 | 10.717391 |
nissan | 112648.648649 | 100.789189 |
opel | 126195.002716 | 106.438077 |
peugeot | 124987.331081 | 94.555743 |
renault | 125116.609784 | 151.279295 |
seat | 118795.669824 | 76.778078 |
skoda | 110000.000000 | 26.955041 |
smart | 99434.250765 | 34.412844 |
sonstige_autos | 94047.619048 | 59.965986 |
toyota | 110776.209677 | 19.217742 |
volkswagen | 123739.483816 | 49.975902 |