This tutorial we study the practical application of basic forecasting methods in Python. Exponential smoothing is not available in any standard Python package, so that they are coded on the forecast.py file, which you need to download from Blackboard.
Data: Australian CPI inflation
Exploratory data analysis
Random Walk
Simple Exponential Smoothing
Model diagnostics
Model validation
Forecast
This notebook relies on the following imports and settings.
# Packages
import warnings
warnings.filterwarnings("ignore")
import numpy as np
from scipy import stats
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
# Plot settings
sns.set_context('notebook')
sns.set_style('ticks')
red='#D62728'
blue='#1F77B4'
%matplotlib inline
Our data is the quarterly change in the Consumer Price Index (CPI) calculated by the Australian Bureau of Statistics. The original dataset is in the ABS website, where you can also find the explanatory notes. We use the index for all expenditure groups, which according to the ABS documentation already contains seasonal adjustments for components that are subject to calendar effects.
We start by loading the data and converting the index to quarterly periods (note that we have to specify this frequency when converting the index). We focus on the data since 1980, which has a total of 146 observations.
data=pd.read_csv('inflation.csv', index_col='Date', parse_dates=True, dayfirst=True)
data.index=data.index.to_period(freq='Q') # converting the index to quarterly period instead of dates
data=data['01-1980':] # filtering the use data from Jan/1980 onwards
data.tail()
Inflation | |
---|---|
Date | |
2016Q2 | 0.4 |
2016Q3 | 0.7 |
2016Q4 | 0.5 |
2017Q1 | 0.5 |
2017Q2 | 0.2 |
For univariate time series modelling, it is better to work with a pandas series rather than dataframe.
y=data['Inflation']
The first step in our analysis is a time series plot. We can see that both the level and volatility of inflation is much lower in recent times than it was in the 80s. There is a noticeable outlier in the third quarter of 2000 due to introduction of the GST in Australia.
fig, ax= plt.subplots(figsize=(8,5))
y.plot(color=red)
ax.set_xlabel('')
ax.set_ylabel('Inflation')
ax.set_title('Australian Quarterly CPI Inflation')
ax.set_xticks([], minor=True) # I prefer to remove the minor ticks for a cleaner plot
sns.despine()
plt.show()
y.describe().round(2)
count 150.00 mean 1.00 std 0.88 min -0.50 25% 0.40 50% 0.70 75% 1.58 max 4.10 Name: Inflation, dtype: float64
In this section we use the random walk method to illustrate the process that we will follow to generate forecasts in the tutorials.
Specify the forecast horizon.
Create a range of dates or periods starting from the time index following the last observation in the data.
Generate the forecasts and store them in a series indexed by step 2.
Below, we generate point forecasts for one to four quarters after the end of the series.
h = 4
test=pd.period_range(start=y.index[-1]+1, periods=h, freq='Q')
pred=pd.Series(np.repeat(y.iloc[-1], h), index=test) # the forecast repeats the last observed values h times
pred
2017Q3 0.2 2017Q4 0.2 2018Q1 0.2 2018Q2 0.2 Freq: Q-DEC, dtype: float64
To compute interval forecasts, we first estimate the standard deviation of the errors.
resid=y-y.shift(1) # the shift lags the series by one period
sigma = resid.std()
round(sigma,3)
0.797
Using the formulas from the lecture, the interval forecasts are as below.
intv = pd.concat([pred-stats.norm.ppf(0.975)*sigma*np.sqrt(np.arange(1,h+1)),
pred+stats.norm.ppf(0.975)*sigma*np.sqrt(np.arange(1,h+1))], axis=1)
intv.round(3)
0 | 1 | |
---|---|---|
2017Q3 | -1.362 | 1.762 |
2017Q4 | -2.009 | 2.409 |
2018Q1 | -2.505 | 2.905 |
2018Q2 | -2.924 | 3.324 |
The exponential smoothing functions are in the forecast module from the LMS.
import forecast # you need to download the forecast.py file from the LMS
ses=forecast.ses(y)
ses.fit()
fitted=pd.Series(ses.smooth(), index=y.index)
ses.summary()
Simple exponential smoothing Smoothing parameter: alpha 0.288 (0.072) In-sample fit: MSE 0.424 Log-likelihood -148.486 AIC 302.971 BIC 312.003
fig, ax= plt.subplots(figsize=(8,5))
y.plot(color=red, label='Inflation')
fitted.plot(color='black', label='Simple exponential smoothing fit', alpha=0.8)
ax.set_xlabel('')
ax.set_ylabel('Inflation')
ax.set_title('Australian Quarterly CPI Inflation')
ax.set_xticks([], minor=True)
plt.legend(loc='best')
sns.despine()
plt.show()
We now conduct residual diagnostics for the exponential smoothing. Recall from the lectures that the key diagnostics for univariate time series are:
We compute the residulas as follows.
resid=y-fitted
The following cells compute the diagnostics. We find that the residuals are uncorrelated, have non-constant variance (due to higher volatitility in the 80s), and are non-Gaussian.
fig, ax= plt.subplots(figsize=(8,5))
resid.plot(color=blue)
ax.set_xlabel('')
ax.set_xticks([], minor=True)
ax.set_title('Residual plot')
sns.despine()
plt.show()
fig, ax = plt.subplots(figsize=(8,5))
sm.graphics.tsa.plot_acf(resid, lags=40, ax=ax)
sns.despine()
plt.show()
def hist(series):
fig, ax= plt.subplots(figsize=(8,5))
sns.distplot(series, ax=ax, hist_kws={'alpha': 0.8, 'edgecolor':'black', 'color': blue},
kde_kws={'color': 'black', 'alpha': 0.7})
sns.despine()
return fig, ax
hist(resid)
plt.show()
Based on this finding, I use only the post-80s data for the rest of the analysis, which will probably lead to a more relevant estimate of the variance of the errors for forecasting inflation.
#y=y['1991':]
ses=forecast.ses(y)
ses.fit()
ses.summary()
Simple exponential smoothing Smoothing parameter: alpha 0.288 (0.072) In-sample fit: MSE 0.424 Log-likelihood -148.486 AIC 302.971 BIC 312.003
We implement a real time forecasting exercise to compare the random walk and simple exponential smoothing methods.
# Real time forecasting - use it as a template
validation=y['2004Q1':].index # the validation period is Q1 2004 onwards
start = y.index.get_loc('2004Q1') # numerical index corresponding to Q1 2005
pred1 = []
pred2 = []
actual= []
for i in range(start, len(y)):
actual.append(y.iloc[i]) # actual value
pred1.append(y.iloc[i-1]) # random walk forecast
model = forecast.ses(y.iloc[:i])
model.fit()
pred2.append(model.forecast(1)[0]) # SES forecast
columns=['RW', 'SES', 'Actual']
results = np.vstack([pred1,pred2,actual]).T
results = pd.DataFrame(results, columns=columns, index=validation)
We find that simple exponential smoothing generates more accurate forecasts.
from statlearning import rmse_jack
table = pd.DataFrame(0.0, index=results.columns[:-1], columns=['RMSE','SE'])
for i in range(2):
table.iloc[i,0], table.iloc[i,1] = rmse_jack(results.iloc[:,i], results.iloc[:,-1])
table.round(3)
RMSE | SE | |
---|---|---|
RW | 0.538 | 0.058 |
SES | 0.499 | 0.055 |
We use a fan chart to report our final forecast. For now, the prediction interval is based on the normal distribution, even though we saw that this is not a good assumption for this data.
h=12
model = ses
test=pd.period_range(start=y.index[-1]+1, periods=h, freq='Q')
pred=pd.Series(model.forecast(h), index=test)
intv1=pd.DataFrame(model.intervalforecast(h, level=.8), index=test)
intv2=pd.DataFrame(model.intervalforecast(h, level=.9), index=test)
intv3=pd.DataFrame(model.intervalforecast(h, level=.99), index=test)
fig, ax = forecast.fanchart(y['01-2012':], pred, intv1, intv2, intv3)
ax.set_xlabel('')
ax.set_xticks([], minor=True)
plt.title('Inflation forecast (simple exponential smoothing)')
sns.despine()
plt.show()