Predictive Analytics (QBUS2820)

Tutorial 11: Forecasting

This tutorial we study the practical application of basic forecasting methods in Python. Exponential smoothing is not available in any standard Python package, so that they are coded on the forecast.py file, which you need to download from Blackboard.

Data: Australian CPI inflation
Exploratory data analysis
Random Walk
Simple Exponential Smoothing
Model diagnostics
Model validation
Forecast

This notebook relies on the following imports and settings.

In [1]:

# Packages
import warnings
warnings.filterwarnings("ignore")
import numpy as np
from scipy import stats
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm

In [2]:

# Plot settings
sns.set_context('notebook') 
sns.set_style('ticks')
red='#D62728'
blue='#1F77B4'
%matplotlib inline

Data: Australian CPI inflation¶

Our data is the quarterly change in the Consumer Price Index (CPI) calculated by the Australian Bureau of Statistics. The original dataset is in the ABS website, where you can also find the explanatory notes. We use the index for all expenditure groups, which according to the ABS documentation already contains seasonal adjustments for components that are subject to calendar effects.

We start by loading the data and converting the index to quarterly periods (note that we have to specify this frequency when converting the index). We focus on the data since 1980, which has a total of 146 observations.

In [3]:

data=pd.read_csv('inflation.csv', index_col='Date', parse_dates=True, dayfirst=True)
data.index=data.index.to_period(freq='Q') # converting the index to quarterly period instead of dates
data=data['01-1980':] # filtering the use data from Jan/1980 onwards
data.tail()

Out[3]:

	Inflation
Date
2016Q2	0.4
2016Q3	0.7
2016Q4	0.5
2017Q1	0.5
2017Q2	0.2

For univariate time series modelling, it is better to work with a pandas series rather than dataframe.

In [4]:

y=data['Inflation']

Exploratory data analysis¶

The first step in our analysis is a time series plot. We can see that both the level and volatility of inflation is much lower in recent times than it was in the 80s. There is a noticeable outlier in the third quarter of 2000 due to introduction of the GST in Australia.

In [5]:

fig, ax= plt.subplots(figsize=(8,5))
y.plot(color=red)
ax.set_xlabel('')
ax.set_ylabel('Inflation')
ax.set_title('Australian Quarterly CPI Inflation')
ax.set_xticks([], minor=True) # I prefer to remove the minor ticks for a cleaner plot
sns.despine()
plt.show()

In [6]:

y.describe().round(2)

Out[6]:

count    150.00
mean       1.00
std        0.88
min       -0.50
25%        0.40
50%        0.70
75%        1.58
max        4.10
Name: Inflation, dtype: float64

Random walk¶

In this section we use the random walk method to illustrate the process that we will follow to generate forecasts in the tutorials.

Specify the forecast horizon.
Create a range of dates or periods starting from the time index following the last observation in the data.
Generate the forecasts and store them in a series indexed by step 2.

Below, we generate point forecasts for one to four quarters after the end of the series.

In [7]:

h = 4
test=pd.period_range(start=y.index[-1]+1, periods=h, freq='Q')
pred=pd.Series(np.repeat(y.iloc[-1], h), index=test) # the forecast repeats the last observed values h times
pred

Out[7]:

2017Q3    0.2
2017Q4    0.2
2018Q1    0.2
2018Q2    0.2
Freq: Q-DEC, dtype: float64

To compute interval forecasts, we first estimate the standard deviation of the errors.

In [8]:

resid=y-y.shift(1) # the shift lags the series by one period
sigma = resid.std()
round(sigma,3)

Out[8]:

0.797

Using the formulas from the lecture, the interval forecasts are as below.

In [9]:

intv = pd.concat([pred-stats.norm.ppf(0.975)*sigma*np.sqrt(np.arange(1,h+1)),
                   pred+stats.norm.ppf(0.975)*sigma*np.sqrt(np.arange(1,h+1))], axis=1)
intv.round(3)

Out[9]:

	0	1
2017Q3	-1.362	1.762
2017Q4	-2.009	2.409
2018Q1	-2.505	2.905
2018Q2	-2.924	3.324

Simple exponential smoothing¶

The exponential smoothing functions are in the forecast module from the LMS.

In [10]:

import forecast # you need to download the forecast.py file from the LMS

ses=forecast.ses(y)
ses.fit()
fitted=pd.Series(ses.smooth(), index=y.index)
ses.summary()

 Simple exponential smoothing

 Smoothing parameter:
 alpha  0.288 (0.072) 

 In-sample fit:
 MSE               0.424
 Log-likelihood   -148.486
 AIC               302.971
 BIC               312.003

In [11]:

fig, ax= plt.subplots(figsize=(8,5))
y.plot(color=red, label='Inflation')
fitted.plot(color='black', label='Simple exponential smoothing fit', alpha=0.8)
ax.set_xlabel('')
ax.set_ylabel('Inflation')
ax.set_title('Australian Quarterly CPI Inflation')
ax.set_xticks([], minor=True) 
plt.legend(loc='best')
sns.despine()
plt.show()

Model diagnostics¶

We now conduct residual diagnostics for the exponential smoothing. Recall from the lectures that the key diagnostics for univariate time series are:

Residual plot.
Residula ACF.
Residual distribution plots.

We compute the residulas as follows.

In [12]:

resid=y-fitted

The following cells compute the diagnostics. We find that the residuals are uncorrelated, have non-constant variance (due to higher volatitility in the 80s), and are non-Gaussian.

In [13]:

fig, ax= plt.subplots(figsize=(8,5))
resid.plot(color=blue)
ax.set_xlabel('')
ax.set_xticks([], minor=True) 
ax.set_title('Residual plot')
sns.despine()
plt.show()

In [14]:

fig, ax = plt.subplots(figsize=(8,5))
sm.graphics.tsa.plot_acf(resid, lags=40, ax=ax)
sns.despine()
plt.show()

In [15]:

def hist(series):
    fig, ax= plt.subplots(figsize=(8,5))
    sns.distplot(series, ax=ax, hist_kws={'alpha': 0.8, 'edgecolor':'black', 'color': blue},  
                 kde_kws={'color': 'black', 'alpha': 0.7})
    sns.despine()
    return fig, ax

hist(resid)
plt.show()

Based on this finding, I use only the post-80s data for the rest of the analysis, which will probably lead to a more relevant estimate of the variance of the errors for forecasting inflation.

In [16]:

#y=y['1991':]
ses=forecast.ses(y)
ses.fit()
ses.summary()

 Simple exponential smoothing

 Smoothing parameter:
 alpha  0.288 (0.072) 

 In-sample fit:
 MSE               0.424
 Log-likelihood   -148.486
 AIC               302.971
 BIC               312.003

Model validation¶

We implement a real time forecasting exercise to compare the random walk and simple exponential smoothing methods.

In [17]:

# Real time forecasting - use it as a template

validation=y['2004Q1':].index # the validation period is Q1 2004 onwards
start = y.index.get_loc('2004Q1') # numerical index corresponding to Q1 2005

pred1 = []
pred2 = []
actual= []
for i in range(start, len(y)):
    
    actual.append(y.iloc[i]) # actual value
    
    pred1.append(y.iloc[i-1]) # random walk forecast
    
    model = forecast.ses(y.iloc[:i]) 
    model.fit()
    pred2.append(model.forecast(1)[0]) # SES forecast

columns=['RW', 'SES', 'Actual']
results = np.vstack([pred1,pred2,actual]).T
results = pd.DataFrame(results, columns=columns, index=validation)

We find that simple exponential smoothing generates more accurate forecasts.

In [18]:

from statlearning import rmse_jack

table = pd.DataFrame(0.0, index=results.columns[:-1], columns=['RMSE','SE'])
for i in range(2):
    table.iloc[i,0], table.iloc[i,1] = rmse_jack(results.iloc[:,i], results.iloc[:,-1])
table.round(3)

Out[18]:

	RMSE	SE
RW	0.538	0.058
SES	0.499	0.055

Forecast¶

We use a fan chart to report our final forecast. For now, the prediction interval is based on the normal distribution, even though we saw that this is not a good assumption for this data.

In [19]:

h=12

model = ses

test=pd.period_range(start=y.index[-1]+1, periods=h, freq='Q')

pred=pd.Series(model.forecast(h), index=test)

intv1=pd.DataFrame(model.intervalforecast(h, level=.8), index=test)
intv2=pd.DataFrame(model.intervalforecast(h, level=.9), index=test)
intv3=pd.DataFrame(model.intervalforecast(h, level=.99), index=test)

fig, ax = forecast.fanchart(y['01-2012':], pred, intv1, intv2, intv3)
ax.set_xlabel('')
ax.set_xticks([], minor=True)
plt.title('Inflation forecast (simple exponential smoothing)')

sns.despine()
plt.show()