Checking Autocorrelation of Electricity Usage

This iPython notebook composed by Justin Elszasz.

The easiest way to forecast electricity consumption is just by looking at the previous values. This notebook takes a look at how electricity consumption relates to that of previous hours.

In [16]:
import pandas as pd
import numpy as np
import import_funcs
from matplotlib import pyplot as plt
from pandas.tools.plotting import autocorrelation_plot
import statsmodels.api as sm
In [34]:
gray_light = '#d4d4d2'
gray_med = '#737373'
red_orange = '#ff4400'
blue_light = '#00d2ff'
In [18]:
# Import the BGE hourly electricity data and weather data using import_funcs.py class
elec_data = import_funcs.BGEdata()
weather = import_funcs.weather()

# Merge into one Pandas dataframe
elec_and_weather = pd.merge(weather,elec_data,left_index=True,right_index=True)

# Clean up a bit
del elec_and_weather['tempm'], elec_and_weather['COST'], elec_and_weather['NOTES'], elec_and_weather['UNITS'], elec_and_weather['TYPE']
del elec_and_weather['precipm']

elec_and_weather['wspdMPH'] = elec_and_weather['wspdm'] * 0.62
del elec_and_weather['wspdm']
In [19]:
# Add historic usage to each X vector

# Set number of hours prediction is in advance
n_hours_advance = 1

# Set number of historic hours used
n_hours_window = 48

# need to do this for range function
#n_hours_window += 1

for k in range(n_hours_advance,n_hours_advance+n_hours_window):
    
    elec_and_weather['USAGE_t-%i'% k] = np.zeros(len(elec_and_weather['USAGE']))
    elec_and_weather['tempF_t-%i'% k] = np.zeros(len(elec_and_weather['tempF']))
    elec_and_weather['hum_t-%i'% k] = np.zeros(len(elec_and_weather['hum']))
    elec_and_weather['wspdMPH_t-%i'% k] = np.zeros(len(elec_and_weather['wspdMPH']))
    
    
for i in range(n_hours_advance+n_hours_window,len(elec_and_weather['USAGE'])):
    
    for j in range(n_hours_advance,n_hours_advance+n_hours_window):
        
        elec_and_weather['USAGE_t-%i'% j][i] = elec_and_weather['USAGE'][i-j]
        elec_and_weather['tempF_t-%i'% j][i] = elec_and_weather['tempF'][i-j]
        elec_and_weather['wspdMPH_t-%i'% j][i] = elec_and_weather['wspdMPH'][i-j]
        elec_and_weather['hum_t-%i'% j][i] = elec_and_weather['hum'][i-j]

elec_and_weather = elec_and_weather.ix[n_hours_advance+n_hours_window:]
     
    
#print elec_and_weather['USAGE_t-3'][:10]
# Yesterday's total kWh

elec_and_weather = elec_and_weather['18-jan-2014 00:00:00':'31-mar-2014 23:00:00']

An autocorrelation plot charts the ratio of the average autocovariance for a given time lag (here in hours) to the variance.

In [70]:
fig = plt.figure()
plot = autocorrelation_plot(elec_and_weather['USAGE'])
fig.savefig('Elec_Autocorrelation.png')

The following lag plots show the relationship between each hourly usage and the usage at a previous time step.

In [64]:
# 1 hour lag
model = sm.OLS(elec_and_weather['USAGE'],sm.add_constant(elec_and_weather['USAGE_t-1']))
res = model.fit()
print res.summary()

fig = plt.figure(figsize=[5,5])
p1= plt.plot(elec_and_weather['USAGE_t-1'],elec_and_weather['USAGE'],'.',color=gray_light,alpha=1.0,markersize=8)
#p2= plt.plot([0,4],[0,4],'k')
# only plotting first and last point in linear fit to display clean line
plot_model = plt.plot([min(elec_and_weather['USAGE_t-1']),max(elec_and_weather['USAGE_t-1'])],[min(res.fittedvalues),max(res.fittedvalues)],label='OLS $R^2$=%.2f' % res.rsquared,linewidth=1,color=red_orange)
plt.xlabel('Elec. Usage at Time $t-1$ hour (kWh)')
plt.ylabel('Elec. Usage at Time $t$ (kWh)')
legend(loc="lower right")

fig.savefig('Elec_Lag_1hour.png')
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  USAGE   R-squared:                       0.943
Model:                            OLS   Adj. R-squared:                  0.943
Method:                 Least Squares   F-statistic:                 2.835e+04
Date:                Tue, 13 May 2014   Prob (F-statistic):               0.00
Time:                        20:37:31   Log-Likelihood:                 429.95
No. Observations:                1701   AIC:                            -855.9
Df Residuals:                    1699   BIC:                            -845.0
Df Model:                           1                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          0.0398      0.009      4.230      0.000         0.021     0.058
USAGE_t-1      0.9717      0.006    168.386      0.000         0.960     0.983
==============================================================================
Omnibus:                      215.819   Durbin-Watson:                   2.401
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             1771.927
Skew:                           0.287   Prob(JB):                         0.00
Kurtosis:                       7.967   Cond. No.                         4.41
==============================================================================
In [65]:
# 6 hour lag
model = sm.OLS(elec_and_weather['USAGE'],sm.add_constant(elec_and_weather['USAGE_t-6']))
res = model.fit()
print res.summary()

fig = plt.figure(figsize=[5,5])
p1= plt.plot(elec_and_weather['USAGE_t-6'],elec_and_weather['USAGE'],'.',color=gray_light,alpha=1.0,markersize=8)
#p2= plt.plot([0,4],[0,4],'k')
# only plotting first and last point in linear fit to display clean line
plot_model = plt.plot([min(elec_and_weather['USAGE_t-6']),max(elec_and_weather['USAGE_t-6'])],[min(res.fittedvalues),max(res.fittedvalues)],'b',label='OLS $R^2$=%.2f' % res.rsquared,linewidth=1,color=red_orange)
plt.xlabel('Elec. Usage at Time $t-6$ hours (kWh)')
plt.ylabel('Elec. Usage at Time $t$ (kWh)')
legend(loc="lower right")

fig.savefig('Elec_Lag_6hour.png')
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  USAGE   R-squared:                       0.840
Model:                            OLS   Adj. R-squared:                  0.840
Method:                 Least Squares   F-statistic:                     8894.
Date:                Tue, 13 May 2014   Prob (F-statistic):               0.00
Time:                        21:11:43   Log-Likelihood:                -456.90
No. Observations:                1701   AIC:                             917.8
Df Residuals:                    1699   BIC:                             928.7
Df Model:                           1                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          0.1137      0.016      7.148      0.000         0.082     0.145
USAGE_t-6      0.9180      0.010     94.310      0.000         0.899     0.937
==============================================================================
Omnibus:                      152.876   Durbin-Watson:                   0.665
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              875.978
Skew:                          -0.177   Prob(JB):                    6.08e-191
Kurtosis:                       6.498   Cond. No.                         4.42
==============================================================================
In [66]:
# 12 hour lag
model = sm.OLS(elec_and_weather['USAGE'],sm.add_constant(elec_and_weather['USAGE_t-12']))
res = model.fit()
print res.summary()

fig = plt.figure(figsize=[5,5])
p1= plt.plot(elec_and_weather['USAGE_t-12'],elec_and_weather['USAGE'],'.',color=gray_light,alpha=1.0,markersize=8)
#p2= plt.plot([0,4],[0,4],'k')
# only plotting first and last point in linear fit to display clean line
plot_model = plt.plot([min(elec_and_weather['USAGE_t-12']),max(elec_and_weather['USAGE_t-12'])],[min(res.fittedvalues),max(res.fittedvalues)],'b',label='OLS $R^2$=%.2f' % res.rsquared,linewidth=1,color=red_orange)
plt.xlabel('Elec. Usage at Time $t-12$ hours (kWh)')
plt.ylabel('Elec. Usage at Time $t$ (kWh)')
legend(loc="lower right")

fig.savefig('Elec_Lag_12hour.png')
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  USAGE   R-squared:                       0.768
Model:                            OLS   Adj. R-squared:                  0.768
Method:                 Least Squares   F-statistic:                     5629.
Date:                Tue, 13 May 2014   Prob (F-statistic):               0.00
Time:                        21:11:57   Log-Likelihood:                -770.34
No. Observations:                1701   AIC:                             1545.
Df Residuals:                    1699   BIC:                             1556.
Df Model:                           1                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          0.1644      0.019      8.564      0.000         0.127     0.202
USAGE_t-12     0.8803      0.012     75.026      0.000         0.857     0.903
==============================================================================
Omnibus:                      137.460   Durbin-Watson:                   0.432
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              780.046
Skew:                           0.024   Prob(JB):                    4.12e-170
Kurtosis:                       6.317   Cond. No.                         4.45
==============================================================================
In [72]:
# 24 hour lag
model = sm.OLS(elec_and_weather['USAGE'],sm.add_constant(elec_and_weather['USAGE_t-24']))
res = model.fit()
print res.summary()

fig = plt.figure(figsize=[5,5])
p1= plt.plot(elec_and_weather['USAGE_t-24'],elec_and_weather['USAGE'],'.',color=gray_light,alpha=1.0,markersize=8)
#p2= plt.plot([0,4],[0,4],'k')
# only plotting first and last point in linear fit to display clean line
plot_model = plt.plot([min(elec_and_weather['USAGE_t-24']),max(elec_and_weather['USAGE_t-24'])],[min(res.fittedvalues),max(res.fittedvalues)],'b',label='OLS $R^2$=%.2f' % res.rsquared,linewidth=1,color=red_orange)
plt.xlabel('Elec. Usage at Time $t-24$ hours (kWh)')
plt.ylabel('Elec. Usage at Time $t$ (kWh)')
legend(loc="lower right")

fig.savefig('Elec_Lag_24hour.png')
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  USAGE   R-squared:                       0.677
Model:                            OLS   Adj. R-squared:                  0.677
Method:                 Least Squares   F-statistic:                     3560.
Date:                Tue, 13 May 2014   Prob (F-statistic):               0.00
Time:                        21:17:10   Log-Likelihood:                -1052.5
No. Observations:                1701   AIC:                             2109.
Df Residuals:                    1699   BIC:                             2120.
Df Model:                           1                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          0.2372      0.023     10.440      0.000         0.193     0.282
USAGE_t-24     0.8248      0.014     59.664      0.000         0.798     0.852
==============================================================================
Omnibus:                      125.400   Durbin-Watson:                   0.247
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              550.153
Skew:                           0.191   Prob(JB):                    3.43e-120
Kurtosis:                       5.760   Cond. No.                         4.47
==============================================================================
In [71]:
# 48 hour lag
model = sm.OLS(elec_and_weather['USAGE'],sm.add_constant(elec_and_weather['USAGE_t-48']))
res = model.fit()
print res.summary()

fig = plt.figure(figsize=[5,5])
p1= plt.plot(elec_and_weather['USAGE_t-48'],elec_and_weather['USAGE'],'.',color=gray_light,alpha=1.0,markersize=8)
#p2= plt.plot([0,4],[0,4],'k')
# only plotting first and last point in linear fit to display clean line
plot_model = plt.plot([min(elec_and_weather['USAGE_t-48']),max(elec_and_weather['USAGE_t-48'])],[min(res.fittedvalues),max(res.fittedvalues)],'b',label='OLS $R^2$=%.2f' % res.rsquared,linewidth=1,color=red_orange)
plt.xlabel('Elec. Usage at Time $t-48$ hours (kWh)')
plt.ylabel('Elec. Usage at Time $t$ (kWh)')
legend(loc="lower right")

fig.savefig('Elec_Lag_48hour.png')
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  USAGE   R-squared:                       0.485
Model:                            OLS   Adj. R-squared:                  0.485
Method:                 Least Squares   F-statistic:                     1602.
Date:                Tue, 13 May 2014   Prob (F-statistic):          2.32e-247
Time:                        21:17:02   Log-Likelihood:                -1448.5
No. Observations:                1701   AIC:                             2901.
Df Residuals:                    1699   BIC:                             2912.
Df Model:                           1                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          0.4031      0.029     13.880      0.000         0.346     0.460
USAGE_t-48     0.7020      0.018     40.029      0.000         0.668     0.736
==============================================================================
Omnibus:                       10.671   Durbin-Watson:                   0.151
Prob(Omnibus):                  0.005   Jarque-Bera (JB):               13.370
Skew:                           0.087   Prob(JB):                      0.00125
Kurtosis:                       3.398   Cond. No.                         4.55
==============================================================================