Goal: visualzie the relationship between the topsoil lead concentration (lead
column, as y-axis) and the topsoil cadmium concentration (cadmium
column, as x-axis).
### Previous steps necessary
# import packages
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
# import dataset
data = pd.read_csv("meuse.csv")
# build the model
regression_model = LinearRegression()
lr = LinearRegression().fit(data.cadmium.reshape((-1, 1)), data.lead)
/Users/lizhoufan/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:10: FutureWarning: reshape is deprecated and will raise in a subsequent release. Please use .values.reshape(...) instead # Remove the CWD from sys.path while we load stuff.
We use the matplotlib package to visualize:
# reference: https://becominghuman.ai/implementing-and-visualizing-linear-regression-in-python-with-scikit-learn-a073768dc688
import matplotlib.pyplot as plt
plt.scatter(data.cadmium.reshape((-1, 1)),data.lead, color = "red")
plt.plot(data.cadmium.reshape((-1, 1)), lr.predict(data.cadmium.reshape((-1, 1))), color = "green")
plt.title("Lead vs Cadmium")
plt.xlabel("Cadmium")
plt.ylabel("Lead")
plt.show()
/Users/lizhoufan/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:2: FutureWarning: reshape is deprecated and will raise in a subsequent release. Please use .values.reshape(...) instead /Users/lizhoufan/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:3: FutureWarning: reshape is deprecated and will raise in a subsequent release. Please use .values.reshape(...) instead This is separate from the ipykernel package so we can avoid doing imports until
Please follow the next post on how do we analyze the model.