#!/usr/bin/env python # coding: utf-8 # # Analyzing Linear Regression Model in Python # ### Analyze a Simple Linear Model # Goal: Obtaining statistical summary about the linear regression line the topsoil lead concentration (`lead` column, as y-axis) and the topsoil cadmium concentration (`cadmium` column, as x-axis). # In[1]: ### Previous steps necessary # import packages import pandas as pd import numpy as np from sklearn.linear_model import LinearRegression # import dataset data = pd.read_csv("meuse.csv") # build the model regression_model = LinearRegression() lr = LinearRegression().fit(data.cadmium.reshape((-1, 1)), data.lead) # ### R^2 # R-squared measures how close the data are fitted to the regression line. # In[7]: print(lr.score(data.cadmium.reshape((-1, 1)), data.lead)) # R-squared measures how close the data are fitted to the regression line. In here, we can conclude that about 63% of the variance of the prediction of `lead` based on `cadmium` can be explained by the linear model `m1`. # For other more advanced, please refer to our later posts regarding advanced topics in Linear Regression Modeling.