Linear Regression¶

In supervised learning, regression is the modelling and prediction of continuous variables. One of the basic techniques of regression is Linear Regression.

Linear regression is usually among the first few topics in learning predictive modeling. Here, the dependent variable (or target variable) is continuous while the independent variables (or predictors) can be continuous or discrete. The nature of regression line is linear, meaning it can be graphically represented as a line on the graph.

Linear Regression establishes a relationship between dependent variable (Y) and one or more independent variables (X) using a best fit straight line.

It is represented by an equation Y=a*X + b, where b is intercept, a is slope of the line. This equation can be used to predict the value of target variable based on given predictor variable(s).

In Machine Learning using python, We can use couple of libraries for performing Linear Regression. One of them is statsmodels. This library has basic modules to perform Ordinary Least Square technique of Linear Regression. Alternatively, we could use another library known as sklearn (Scikit-Learn). This provides python implementations of various Machine Learning algorithms.

In this demonstration notebook we would see an example of both a statsmodels based algorithm as well as a sklearn based model.

Importing the libraries¶

Import the required libraries and initialize the code. Here we will be using:

Pandas, Numpy for data manipulation and arrays
Matplotlib, Seaborn, Pyplot and BQplot for visualizations
Sklearn and Statsmodels for Machine Learning algorithms and statistics.

And some additional helpers....

In [1]:

# %load ../standard_import.txt
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d
import seaborn as sns

from sklearn.preprocessing import scale
import sklearn.linear_model as skl_lm
from sklearn.metrics import mean_squared_error, r2_score
import statsmodels.api as sm
import statsmodels.formula.api as smf

%matplotlib inline
plt.style.use('seaborn-white')

C:\ProgramData\Anaconda3\lib\site-packages\statsmodels\compat\pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
  from pandas.core import datetools

Load Datasets¶

The example datasets on this tutorial come from the book: An Introduction to Statistical Learning with Applications in R

We will be using the Auto.csv and Advertising.csv datasets.

In [2]:

advertising = pd.read_csv('https://raw.githubusercontent.com/colaberry/DSin100days/master/data/Advertising.csv', usecols=[1,2,3,4])
advertising.info()
advertising.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 4 columns):
TV           200 non-null float64
Radio        200 non-null float64
Newspaper    200 non-null float64
Sales        200 non-null float64
dtypes: float64(4)
memory usage: 6.3 KB

Out[2]:

	TV	Radio	Newspaper	Sales
0	230.1	37.8	69.2	22.1
1	44.5	39.3	45.1	10.4
2	17.2	45.9	69.3	9.3
3	151.5	41.3	58.5	18.5
4	180.8	10.8	58.4	12.9

In [3]:

auto = pd.read_csv('https://raw.githubusercontent.com/colaberry/DSin100days/master/data/Auto.csv', na_values='?').dropna()
auto.info()
auto.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 392 entries, 0 to 396
Data columns (total 9 columns):
mpg             392 non-null float64
cylinders       392 non-null int64
displacement    392 non-null float64
horsepower      392 non-null float64
weight          392 non-null int64
acceleration    392 non-null float64
year            392 non-null int64
origin          392 non-null int64
name            392 non-null object
dtypes: float64(4), int64(4), object(1)
memory usage: 30.6+ KB

Out[3]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	year	origin	name
0	18.0	8	307.0	130.0	3504	12.0	70	1	chevrolet chevelle malibu
1	15.0	8	350.0	165.0	3693	11.5	70	1	buick skylark 320
2	18.0	8	318.0	150.0	3436	11.0	70	1	plymouth satellite
3	16.0	8	304.0	150.0	3433	12.0	70	1	amc rebel sst
4	17.0	8	302.0	140.0	3449	10.5	70	1	ford torino

Simple Linear Regression¶

Least squares fit¶

Visualizations using Seaborn¶

In [4]:

sns.regplot(advertising.TV, advertising.Sales, order=1, ci=None, scatter_kws={'color':'r', 's':9})
plt.xlim(-10,310)
plt.ylim(ymin=0);

Visualizations using plotly¶

In [5]:

import plotly.plotly as py
import plotly.graph_objs as go

# Create a trace
trace = go.Scatter(
    x = advertising.TV,
    y = advertising.Sales,
    mode = 'markers'
)

data = [trace]

# Plot and embed in ipython notebook!
py.iplot(data, filename='basic-scatter')

Out[5]:

Visualizations with BQPlot¶

Let us do an interactive visualization of the scatterplot to see how TV Advertising affects the Sales, with some interactions. Select some outliers visually and print them.

In [6]:

from __future__ import print_function
from bqplot import *
import numpy as np
import pandas as pd
from ipywidgets import Layout

x_sc = LinearScale()
y_sc = LinearScale()

x_data = np.arange(20)
y_data = np.random.randn(20)

scatter_chart = Scatter(x=advertising.TV, y=advertising.Sales, scales= {'x': x_sc, 'y': y_sc}, colors=['dodgerblue'],
                       interactions={'click': 'select'},
                        selected_style={'opacity': 1.0, 'fill': 'DarkOrange', 'stroke': 'Red'},
                       unselected_style={'opacity': 0.5})

ax_x = Axis(scale=x_sc)
ax_y = Axis(scale=y_sc, orientation='vertical', tick_format='0.2f')

Figure(marks=[scatter_chart], axes=[ax_x, ax_y])

Figure(axes=[Axis(scale=LinearScale()), Axis(orientation='vertical', scale=LinearScale(), tick_format='0.2f')]…

In [7]:

#If you need to find out the visually selected points and print them as array, you could use .selected attribute.
scatter_chart.selected

Out[7]:

[]

Regression coefficients - RSS¶

Note that the text in the ISLR book describes the coefficients based on uncentered data, whereas the plot shows the model based on centered data. The latter is visually more appealing for explaining the concept of a minimum RSS. I think that, in order not to confuse the reader, the values on the axis of the B0 coefficients have been changed to correspond with the text. The axes on the plots below are unaltered.

In [8]:

# Regression coefficients (Ordinary Least Squares)
regr = skl_lm.LinearRegression()

X = scale(advertising.TV, with_mean=True, with_std=False).reshape(-1,1)
y = advertising.Sales

regr.fit(X,y)
print(regr.intercept_)
print(regr.coef_)

14.0225
[ 0.04753664]

In [9]:

# Create grid coordinates for plotting
B0 = np.linspace(regr.intercept_-2, regr.intercept_+2, 50)
B1 = np.linspace(regr.coef_-0.02, regr.coef_+0.02, 50)
xx, yy = np.meshgrid(B0, B1, indexing='xy')
Z = np.zeros((B0.size,B1.size))

# Calculate Z-values (RSS) based on grid of coefficients
for (i,j),v in np.ndenumerate(Z):
    Z[i,j] =((y - (xx[i,j]+X.ravel()*yy[i,j]))**2).sum()/1000

# Minimized RSS
min_RSS = r'$\beta_0$, $\beta_1$ for minimized RSS'
min_rss = np.sum((regr.intercept_+regr.coef_*X - y.values.reshape(-1,1))**2)/1000
min_rss

Out[9]:

2.1025305831313514

In [10]:

fig = plt.figure(figsize=(15,6))
fig.suptitle('RSS - Regression coefficients', fontsize=20)

ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122, projection='3d')

# Left plot
CS = ax1.contour(xx, yy, Z, cmap=plt.cm.Set1, levels=[2.15, 2.2, 2.3, 2.5, 3])
ax1.scatter(regr.intercept_, regr.coef_[0], c='r', label=min_RSS)
ax1.clabel(CS, inline=True, fontsize=10, fmt='%1.1f')

# Right plot
ax2.plot_surface(xx, yy, Z, rstride=3, cstride=3, alpha=0.3)
ax2.contour(xx, yy, Z, zdir='z', offset=Z.min(), cmap=plt.cm.Set1,
            alpha=0.4, levels=[2.15, 2.2, 2.3, 2.5, 3])
ax2.scatter3D(regr.intercept_, regr.coef_[0], min_rss, c='r', label=min_RSS)
ax2.set_zlabel('RSS')
ax2.set_zlim(Z.min(),Z.max())
ax2.set_ylim(0.02,0.07)

# settings common to both plots
for ax in fig.axes:
    ax.set_xlabel(r'$\beta_0$', fontsize=17)
    ax.set_ylabel(r'$\beta_1$', fontsize=17)
    ax.set_yticks([0.03,0.04,0.05,0.06])
    ax.legend()

Confidence interval - Statsmodels¶

In [11]:

est = smf.ols('Sales ~ TV', advertising).fit()
est.summary().tables[1]

Out[11]:

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	7.0326	0.458	15.360	0.000	6.130	7.935
TV	0.0475	0.003	17.668	0.000	0.042	0.053

In [12]:

# RSS with regression coefficients
((advertising.Sales - (est.params[0] + est.params[1]*advertising.TV))**2).sum()/1000

Out[12]:

2.1025305831313506

Scikit-learn¶

In [13]:

regr = skl_lm.LinearRegression()

X = advertising.TV.values.reshape(-1,1)
y = advertising.Sales

regr.fit(X,y)
print(regr.intercept_)
print(regr.coef_)

7.03259354913
[ 0.04753664]

In [14]:

Sales_pred = regr.predict(X)
r2_score(y, Sales_pred)

Out[14]:

0.61187505085007099

Interactive Outlier Analysis using bqplot¶

Next up, we will plot the regression line for TV Vs Sales, Then move some outliers around to see how it impacts the regression line. This would give us an idea if we need to remove some outliers to get a more consistent model with the general direction of the data.

In [15]:

import bqplot.marks as bqm
import bqplot.scales as bqs
import bqplot.axes as bqa
import bqplot as bq
from IPython.display import display
import ipywidgets as widgets

print('ipywidgets version', widgets.__version__)
print('bqplot version', bq.__version__)
def update_line(change):
    # create line fit to data and display equation
    lin.x = [np.min(scat.x), np.max(scat.x)]
    poly = np.polyfit(scat.x, scat.y, 1)
    lin.y = np.polyval(poly, lin.x)
    label.value = 'y = {:.2f} + {:.2f}x'.format(poly[1], poly[0])   
    
# create initial data set
size = 10
np.random.seed(0)
x_data = advertising.TV 
y_data = advertising.Sales 

# set up plot elements
sc_x = bqs.LinearScale()
sc_y = bqs.LinearScale()
ax_x = bqa.Axis(scale=sc_x)
ax_y = bqa.Axis(scale=sc_y, tick_format='0.2f', orientation='vertical')

# place data on scatter plot that allows point dragging
scat = bqm.Scatter(x=x_data, 
                   y=y_data, 
                   scales={'x': sc_x, 'y': sc_y}, 
                   enable_move=True)
# set up callback
scat.observe(update_line, names=['x', 'y'])

# linear fit line
lin = bqm.Lines(scales={'x': sc_x, 'y': sc_y})
# equation label
label = widgets.Label()
# containers
fig = bq.Figure(marks=[scat, lin], axes=[ax_x, ax_y])
box = widgets.VBox([label, fig])

# initialize plot and equation and display
update_line(None)
display(box)

ipywidgets version 7.2.1
bqplot version 0.10.5

VBox(children=(Label(value='y = 7.03 + 0.05x'), Figure(axes=[Axis(scale=LinearScale()), Axis(orientation='vert…

Multiple Linear Regression¶

Statsmodels¶

Let us start by including two separate variables (Radio and NewsPaper) in two different models.

In [16]:

est = smf.ols('Sales ~ Radio', advertising).fit()
est.summary().tables[1]

Out[16]:

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	9.3116	0.563	16.542	0.000	8.202	10.422
Radio	0.2025	0.020	9.921	0.000	0.162	0.243

In [17]:

est = smf.ols('Sales ~ Newspaper', advertising).fit()
est.summary().tables[1]

Out[17]:

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	12.3514	0.621	19.876	0.000	11.126	13.577
Newspaper	0.0547	0.017	3.300	0.001	0.022	0.087

Statsmodels¶

Now let us including the two variables (Radio and NewsPaper) together in a single model.

In [18]:

est = smf.ols('Sales ~ TV + Radio + Newspaper', advertising).fit()
est.summary()

Out[18]:

OLS Regression Results
Dep. Variable:	Sales	R-squared:	0.897
Model:	OLS	Adj. R-squared:	0.896
Method:	Least Squares	F-statistic:	570.3
Date:	Wed, 20 Jun 2018	Prob (F-statistic):	1.58e-96
Time:	19:03:45	Log-Likelihood:	-386.18
No. Observations:	200	AIC:	780.4
Df Residuals:	196	BIC:	793.6
Df Model:	3
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	2.9389	0.312	9.422	0.000	2.324	3.554
TV	0.0458	0.001	32.809	0.000	0.043	0.049
Radio	0.1885	0.009	21.893	0.000	0.172	0.206
Newspaper	-0.0010	0.006	-0.177	0.860	-0.013	0.011

Omnibus:	60.414	Durbin-Watson:	2.084
Prob(Omnibus):	0.000	Jarque-Bera (JB):	151.241
Skew:	-1.327	Prob(JB):	1.44e-33
Kurtosis:	6.332	Cond. No.	454.

Correlation Matrix¶

In [19]:

advertising.corr()

Out[19]:

	TV	Radio	Newspaper	Sales
TV	1.000000	0.054809	0.056648	0.782224
Radio	0.054809	1.000000	0.354104	0.576223
Newspaper	0.056648	0.354104	1.000000	0.228299
Sales	0.782224	0.576223	0.228299	1.000000

Multiple Linear Regression¶

Let us do a multiple regression using sklearn.

In [20]:

regr = skl_lm.LinearRegression()

X = advertising[['Radio', 'TV']].as_matrix()
y = advertising.Sales

regr.fit(X,y)
print(regr.coef_)
print(regr.intercept_)

regr_model = regr

def predict(tv, radio):
    data = pd.DataFrame({'TV': [tv], 'Radio': [radio]})
    return regr_model.predict(data)

#prediction = predict(advertising[['TV','Radio']])
prediction = predict(180.8,10.8)
print(prediction)

[ 0.18799423  0.04575482]
2.92109991241
[ 13.22390813]

3-D visualization using Matplotlib¶

Let us perform a 3-d visualization of the hyperplane using matplotlib. Here are the steps:

First let us draw thw axes and mark the min and max range of predictors.
Then shade the area that would be forming the hyperplane
Elevate and tilt the hyperplane according to the intercept and regression coefficients.

This involves bit of python coding as below.

In [21]:

# What are the min/max values of Radio & TV?
# Use these values to set up the grid for plotting.
advertising[['Radio', 'TV']].describe()

Out[21]:

	Radio	TV
count	200.000000	200.000000
mean	23.264000	147.042500
std	14.846809	85.854236
min	0.000000	0.700000
25%	9.975000	74.375000
50%	22.900000	149.750000
75%	36.525000	218.825000
max	49.600000	296.400000

In [22]:

# Create a coordinate grid
Radio = np.arange(0,50)
TV = np.arange(0,300)

B1, B2 = np.meshgrid(Radio, TV, indexing='xy')
Z = np.zeros((TV.size, Radio.size))

# Here is the place where we tilt and elevate the hyperplane
for (i,j),v in np.ndenumerate(Z):
        Z[i,j] =(regr.intercept_ + B1[i,j]*regr.coef_[0] + B2[i,j]*regr.coef_[1])

In [23]:

# Create plot
fig = plt.figure(figsize=(10,6))
fig.suptitle('Regression: Sales ~ Radio + TV Advertising', fontsize=20)

ax = axes3d.Axes3D(fig)

ax.plot_surface(B1, B2, Z, rstride=10, cstride=5, alpha=0.4)
ax.scatter3D(advertising.Radio, advertising.TV, advertising.Sales, c='r')

ax.set_xlabel('Radio')
ax.set_xlim(0,50)
ax.set_ylabel('TV')
ax.set_ylim(ymin=0)
ax.set_zlabel('Sales');

Interaction Variables¶

In [24]:

est = smf.ols('Sales ~ TV + Radio + TV*Radio', advertising).fit()
est.summary().tables[1]

Out[24]:

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	6.7502	0.248	27.233	0.000	6.261	7.239
TV	0.0191	0.002	12.699	0.000	0.016	0.022
Radio	0.0289	0.009	3.241	0.001	0.011	0.046
TV:Radio	0.0011	5.24e-05	20.727	0.000	0.001	0.001

Non-linear relationships¶

In [25]:

# With Seaborn's regplot() you can easily plot higher order polynomials.
plt.scatter(auto.horsepower, auto.mpg, facecolors='None', edgecolors='k', alpha=.5) 
sns.regplot(auto.horsepower, auto.mpg, ci=None, label='Linear', scatter=False, color='orange')
sns.regplot(auto.horsepower, auto.mpg, ci=None, label='Degree 2', order=2, scatter=False, color='lightblue')
sns.regplot(auto.horsepower, auto.mpg, ci=None, label='Degree 5', order=5, scatter=False, color='g')
plt.legend()
plt.ylim(5,55)
plt.xlim(40,240);

In [26]:

# Scientific libraries
from numpy import arange,array,ones
from scipy import stats


xi = arange(0,9)
A = array([ xi, ones(9)])

# (Almost) linear sequence
y = [19, 20, 20.5, 21.5, 22, 23, 23, 25.5, 24]

# Generated linear fit
slope, intercept, r_value, p_value, std_err = stats.linregress(auto.horsepower,auto.mpg)
line = slope*auto.horsepower+intercept
print(slope)
print(intercept)

# Creating the dataset, and generating the plot
trace1 = go.Scatter(
                  x = auto.horsepower,
                  y = auto.mpg,
                  mode='markers',
                  marker=go.Marker(color='rgb(255, 127, 14)'),
                  name='Data'
                  )

trace2 = go.Scatter(
                  x=auto.horsepower,
                  y=line,
                  mode='lines',
                  marker=go.Marker(color='rgb(31, 119, 180)'),
                  name='Fit'
                  )

#annotation = go.Annotation(
#                  x=3.5,
#                  y=23.5,
#                  text='$R^2 = 0.9551,\\Y = 0.716X + 19.18$',
#                  showarrow=False,
#                  font=go.Font(size=16)
#                  )
layout = go.Layout(
                title='Linear Fit in Python',
                plot_bgcolor='rgb(229, 229, 229)',
                  xaxis=go.XAxis(zerolinecolor='rgb(255,255,255)', gridcolor='rgb(255,255,255)'),
                  yaxis=go.YAxis(zerolinecolor='rgb(255,255,255)', gridcolor='rgb(255,255,255)')
#                  annotations=[annotation]
                )

data = [trace1, trace2]
fig = go.Figure(data=data, layout=layout)

py.iplot(fig, filename='Linear-Fit-in-python')

-0.157844733354
39.9358610212

Out[26]:

In [27]:

# Generated linear fit
slope2, intercept2, r_value2, p_value2, std_err2 = stats.linregress(advertising.TV,advertising.Sales)
line2 = slope2*advertising.TV+intercept2

Using a Polynomial relationship of 2nd Degree¶

In [28]:

auto['horsepower2'] = auto.horsepower**2
auto.head(3)

Out[28]:

	mpg	cylinders	displacement	horsepower	weight	acceleration	year	origin	name	horsepower2
0	18.0	8	307.0	130.0	3504	12.0	70	1	chevrolet chevelle malibu	16900.0
1	15.0	8	350.0	165.0	3693	11.5	70	1	buick skylark 320	27225.0
2	18.0	8	318.0	150.0	3436	11.0	70	1	plymouth satellite	22500.0

In [29]:

est = smf.ols('mpg ~ horsepower + horsepower2', auto).fit()
est.summary().tables[1]

Out[29]:

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	56.9001	1.800	31.604	0.000	53.360	60.440
horsepower	-0.4662	0.031	-14.978	0.000	-0.527	-0.405
horsepower2	0.0012	0.000	10.080	0.000	0.001	0.001

Plotting the residuals¶

In [30]:

regr = skl_lm.LinearRegression()

# Linear fit
X = auto.horsepower.values.reshape(-1,1)
y = auto.mpg
regr.fit(X, y)

auto['pred1'] = regr.predict(X)
auto['resid1'] = auto.mpg - auto.pred1

# Quadratic fit
X2 = auto[['horsepower', 'horsepower2']].as_matrix()
regr.fit(X2, y)

auto['pred2'] = regr.predict(X2)
auto['resid2'] = auto.mpg - auto.pred2

In [31]:

fig, (ax1,ax2) = plt.subplots(1,2, figsize=(12,5))

# Left plot
sns.regplot(auto.pred1, auto.resid1, lowess=True, 
            ax=ax1, line_kws={'color':'r', 'lw':1},
            scatter_kws={'facecolors':'None', 'edgecolors':'k', 'alpha':0.5})
ax1.hlines(0,xmin=ax1.xaxis.get_data_interval()[0],
           xmax=ax1.xaxis.get_data_interval()[1], linestyles='dotted')
ax1.set_title('Residual Plot for Linear Fit')

# Right plot
sns.regplot(auto.pred2, auto.resid2, lowess=True,
            line_kws={'color':'r', 'lw':1}, ax=ax2,
            scatter_kws={'facecolors':'None', 'edgecolors':'k', 'alpha':0.5})
ax2.hlines(0,xmin=ax2.xaxis.get_data_interval()[0],
           xmax=ax2.xaxis.get_data_interval()[1], linestyles='dotted')
ax2.set_title('Residual Plot for Quadratic Fit')

for ax in fig.axes:
    ax.set_xlabel('Fitted values')
    ax.set_ylabel('Residuals')

Front End App to test Regression¶

In [32]:

import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
from IPython import display

# Here is the App display embedded in the Jupyter notebook.
#This can be run only once in offline mode.
def show_app(app,  # type: dash.Dash
             port=9999,
             width=1000,
             height=350,
             offline=True,
             style=True,
             **dash_flask_kwargs):
    """
    Run the application inside a Jupyter notebook and show an iframe with it
    :param app:
    :param port:
    :param width:
    :param height:
    :param offline:
    :return:
    """
    url = 'http://localhost:%d' % port
    iframe = '<iframe src="{url}" width={width} height={height}></iframe>'.format(url=url,
                                                                                  width=width,
                                                                                  height=height)
    display.display_html(iframe, raw=True)
    if offline:
        app.css.config.serve_locally = False
        app.scripts.config.serve_locally = False
    if style:
        external_css = ["https://fonts.googleapis.com/css?family=Raleway:400,300,600",
                        "https://maxcdn.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css",
                        "http://getbootstrap.com/dist/css/bootstrap.min.css", 
                       "https://codepen.io/chriddyp/pen/bWLwgP.css"]

        for css in external_css:
            app.css.append_css({"external_url": css})

        external_js = ["https://code.jquery.com/jquery-3.2.1.min.js",
                       "https://cdn.rawgit.com/plotly/dash-app-stylesheets/a3401de132a6d0b652ba11548736b1d1e80aa10d/dash-goldman-sachs-report-js.js",
                       "http://getbootstrap.com/dist/js/bootstrap.min.js"]

        for js in external_js:
            app.scripts.append_script({"external_url": js})

    return app.run_server(debug=True,  # needs to be false in Jupyter
                          port=port,
                          **dash_flask_kwargs)

In [33]:

# Here is the Dash App Layout and interaction behaviour defined.
app = dash.Dash()
#stylesheets = {'stylesheet.css': 'https://codepen.io/chriddyp/pen/bWLwgP.css'}
app.layout = html.Div(children=[
    html.Link(
        rel='stylesheet',
        href='/static/bWLwgP.css'
    ),
    html.H1(children='Predicting Sales using Regression'),

    html.Div(children=[html.Label('TV Advertising Spend in $million '),
    dcc.Input(id='tv-id', value='230.1', type='text')]),

    html.Div(children=[html.Label('Radio Advertising Spend in $million '),
    dcc.Input(id='radio-id', value='37.8', type='text')]),

    html.Div(id='predicted-div'),
        dcc.Graph(
        id='advert-graph',
        figure={
            'data': [
                go.Scatter(
                    x=advertising['TV'],
                    y=advertising['Sales'],
                    #text='' + df2['TV'],
                    mode='markers',
                    opacity=0.7,
                    marker={
                        'size': 15,
                        'line': {'width': 0.5, 'color': 'white'}
                    }
                )

            ],
            'layout': go.Layout(
                xaxis={'type': 'log', 'title': 'TV Ad Spend'},
                yaxis={'title': 'Sales Turnover'},
                margin={'l': 40, 'b': 40, 't': 10, 'r': 10},
                legend={'x': 0, 'y': 1},
                hovermode='closest'
            )
        }    
    )
])
@app.callback(
    Output(component_id='predicted-div', component_property='children'),
    [Input(component_id='tv-id', component_property='value'), 
     Input(component_id='radio-id', component_property='value')]
)
def update_output_div(tv_value,radio_value):
    prediction = predict(tv_value, radio_value)
    return 'You\'ve entered TV Spend as "{}" and Radio spend as "{}". And the predicted sales is "{}"'.format(tv_value, radio_value, prediction[0]) 

In [ ]:

show_app(app)

 * Debugger is active!
 * Debugger PIN: 166-122-810
 * Running on http://127.0.0.1:9999/ (Press CTRL+C to quit)
127.0.0.1 - - [20/Jun/2018 19:04:03] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [20/Jun/2018 19:04:03] "GET /_dash-dependencies HTTP/1.1" 200 -
127.0.0.1 - - [20/Jun/2018 19:04:03] "GET /_dash-layout HTTP/1.1" 200 -
127.0.0.1 - - [20/Jun/2018 19:04:03] "POST /_dash-update-component HTTP/1.1" 200 -