%matplotlib inline
import pandas as pd
data = pd.read_csv('https://github.com/albahnsen/PracticalMachineLearningClass/raw/master/datasets/dataTrain_carListings.zip')
data.head()
data.shape
data.Price.describe()
data.plot(kind='scatter', y='Price', x='Year')
data.plot(kind='scatter', y='Price', x='Mileage')
data.columns
Develop a machine learning model that predicts the price of the of car using as an input ['Year', 'Mileage', 'State', 'Make', 'Model']
Submit the prediction of the testing set to Kaggle https://www.kaggle.com/c/miia4200-20191-p1-usedcarpriceprediction
data_test = pd.read_csv('https://github.com/albahnsen/PracticalMachineLearningClass/raw/master/datasets/dataTest_carListings.zip', index_col=0)
data_test.head()
data_test.shape
import numpy as np
np.random.seed(42)
y_pred = pd.DataFrame(np.random.rand(data_test.shape[0]) * 75000 + 5000, index=data_test.index, columns=['Price'])
y_pred.to_csv('test_submission.csv', index_label='ID')
y_pred.head()