#!/usr/bin/env python
# coding: utf-8

# # Analyzing Linear Regression Model in Python

# ### Analyze a Simple Linear Model

# Goal: Obtaining statistical summary about the linear regression line the topsoil lead concentration (`lead` column, as y-axis) and the topsoil cadmium concentration (`cadmium` column, as x-axis). 

# In[1]:


### Previous steps necessary
# import packages
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
# import dataset
data = pd.read_csv("meuse.csv")
# build the model
regression_model = LinearRegression()
lr = LinearRegression().fit(data.cadmium.reshape((-1, 1)), data.lead)


# ### R^2
# R-squared measures how close the data are fitted to the regression line.

# In[7]:


print(lr.score(data.cadmium.reshape((-1, 1)), data.lead))


# R-squared measures how close the data are fitted to the regression line. In here, we can conclude that about 63% of the variance of the prediction of `lead` based on `cadmium` can be explained by the linear model `m1`.

# For other more advanced, please refer to our later posts regarding advanced topics in Linear Regression Modeling.