This iPython notebook composed by Justin Elszasz.
The easiest way to forecast electricity consumption is just by looking at the previous values. This notebook takes a look at how electricity consumption relates to that of previous hours.
import pandas as pd
import numpy as np
import import_funcs
from matplotlib import pyplot as plt
from pandas.tools.plotting import autocorrelation_plot
import statsmodels.api as sm
gray_light = '#d4d4d2'
gray_med = '#737373'
red_orange = '#ff4400'
blue_light = '#00d2ff'
# Import the BGE hourly electricity data and weather data using import_funcs.py class
elec_data = import_funcs.BGEdata()
weather = import_funcs.weather()
# Merge into one Pandas dataframe
elec_and_weather = pd.merge(weather,elec_data,left_index=True,right_index=True)
# Clean up a bit
del elec_and_weather['tempm'], elec_and_weather['COST'], elec_and_weather['NOTES'], elec_and_weather['UNITS'], elec_and_weather['TYPE']
del elec_and_weather['precipm']
elec_and_weather['wspdMPH'] = elec_and_weather['wspdm'] * 0.62
del elec_and_weather['wspdm']
# Add historic usage to each X vector
# Set number of hours prediction is in advance
n_hours_advance = 1
# Set number of historic hours used
n_hours_window = 48
# need to do this for range function
#n_hours_window += 1
for k in range(n_hours_advance,n_hours_advance+n_hours_window):
elec_and_weather['USAGE_t-%i'% k] = np.zeros(len(elec_and_weather['USAGE']))
elec_and_weather['tempF_t-%i'% k] = np.zeros(len(elec_and_weather['tempF']))
elec_and_weather['hum_t-%i'% k] = np.zeros(len(elec_and_weather['hum']))
elec_and_weather['wspdMPH_t-%i'% k] = np.zeros(len(elec_and_weather['wspdMPH']))
for i in range(n_hours_advance+n_hours_window,len(elec_and_weather['USAGE'])):
for j in range(n_hours_advance,n_hours_advance+n_hours_window):
elec_and_weather['USAGE_t-%i'% j][i] = elec_and_weather['USAGE'][i-j]
elec_and_weather['tempF_t-%i'% j][i] = elec_and_weather['tempF'][i-j]
elec_and_weather['wspdMPH_t-%i'% j][i] = elec_and_weather['wspdMPH'][i-j]
elec_and_weather['hum_t-%i'% j][i] = elec_and_weather['hum'][i-j]
elec_and_weather = elec_and_weather.ix[n_hours_advance+n_hours_window:]
#print elec_and_weather['USAGE_t-3'][:10]
# Yesterday's total kWh
elec_and_weather = elec_and_weather['18-jan-2014 00:00:00':'31-mar-2014 23:00:00']
An autocorrelation plot charts the ratio of the average autocovariance for a given time lag (here in hours) to the variance.
fig = plt.figure()
plot = autocorrelation_plot(elec_and_weather['USAGE'])
fig.savefig('Elec_Autocorrelation.png')
The following lag plots show the relationship between each hourly usage and the usage at a previous time step.
# 1 hour lag
model = sm.OLS(elec_and_weather['USAGE'],sm.add_constant(elec_and_weather['USAGE_t-1']))
res = model.fit()
print res.summary()
fig = plt.figure(figsize=[5,5])
p1= plt.plot(elec_and_weather['USAGE_t-1'],elec_and_weather['USAGE'],'.',color=gray_light,alpha=1.0,markersize=8)
#p2= plt.plot([0,4],[0,4],'k')
# only plotting first and last point in linear fit to display clean line
plot_model = plt.plot([min(elec_and_weather['USAGE_t-1']),max(elec_and_weather['USAGE_t-1'])],[min(res.fittedvalues),max(res.fittedvalues)],label='OLS $R^2$=%.2f' % res.rsquared,linewidth=1,color=red_orange)
plt.xlabel('Elec. Usage at Time $t-1$ hour (kWh)')
plt.ylabel('Elec. Usage at Time $t$ (kWh)')
legend(loc="lower right")
fig.savefig('Elec_Lag_1hour.png')
OLS Regression Results ============================================================================== Dep. Variable: USAGE R-squared: 0.943 Model: OLS Adj. R-squared: 0.943 Method: Least Squares F-statistic: 2.835e+04 Date: Tue, 13 May 2014 Prob (F-statistic): 0.00 Time: 20:37:31 Log-Likelihood: 429.95 No. Observations: 1701 AIC: -855.9 Df Residuals: 1699 BIC: -845.0 Df Model: 1 ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ const 0.0398 0.009 4.230 0.000 0.021 0.058 USAGE_t-1 0.9717 0.006 168.386 0.000 0.960 0.983 ============================================================================== Omnibus: 215.819 Durbin-Watson: 2.401 Prob(Omnibus): 0.000 Jarque-Bera (JB): 1771.927 Skew: 0.287 Prob(JB): 0.00 Kurtosis: 7.967 Cond. No. 4.41 ==============================================================================
# 6 hour lag
model = sm.OLS(elec_and_weather['USAGE'],sm.add_constant(elec_and_weather['USAGE_t-6']))
res = model.fit()
print res.summary()
fig = plt.figure(figsize=[5,5])
p1= plt.plot(elec_and_weather['USAGE_t-6'],elec_and_weather['USAGE'],'.',color=gray_light,alpha=1.0,markersize=8)
#p2= plt.plot([0,4],[0,4],'k')
# only plotting first and last point in linear fit to display clean line
plot_model = plt.plot([min(elec_and_weather['USAGE_t-6']),max(elec_and_weather['USAGE_t-6'])],[min(res.fittedvalues),max(res.fittedvalues)],'b',label='OLS $R^2$=%.2f' % res.rsquared,linewidth=1,color=red_orange)
plt.xlabel('Elec. Usage at Time $t-6$ hours (kWh)')
plt.ylabel('Elec. Usage at Time $t$ (kWh)')
legend(loc="lower right")
fig.savefig('Elec_Lag_6hour.png')
OLS Regression Results ============================================================================== Dep. Variable: USAGE R-squared: 0.840 Model: OLS Adj. R-squared: 0.840 Method: Least Squares F-statistic: 8894. Date: Tue, 13 May 2014 Prob (F-statistic): 0.00 Time: 21:11:43 Log-Likelihood: -456.90 No. Observations: 1701 AIC: 917.8 Df Residuals: 1699 BIC: 928.7 Df Model: 1 ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ const 0.1137 0.016 7.148 0.000 0.082 0.145 USAGE_t-6 0.9180 0.010 94.310 0.000 0.899 0.937 ============================================================================== Omnibus: 152.876 Durbin-Watson: 0.665 Prob(Omnibus): 0.000 Jarque-Bera (JB): 875.978 Skew: -0.177 Prob(JB): 6.08e-191 Kurtosis: 6.498 Cond. No. 4.42 ==============================================================================
# 12 hour lag
model = sm.OLS(elec_and_weather['USAGE'],sm.add_constant(elec_and_weather['USAGE_t-12']))
res = model.fit()
print res.summary()
fig = plt.figure(figsize=[5,5])
p1= plt.plot(elec_and_weather['USAGE_t-12'],elec_and_weather['USAGE'],'.',color=gray_light,alpha=1.0,markersize=8)
#p2= plt.plot([0,4],[0,4],'k')
# only plotting first and last point in linear fit to display clean line
plot_model = plt.plot([min(elec_and_weather['USAGE_t-12']),max(elec_and_weather['USAGE_t-12'])],[min(res.fittedvalues),max(res.fittedvalues)],'b',label='OLS $R^2$=%.2f' % res.rsquared,linewidth=1,color=red_orange)
plt.xlabel('Elec. Usage at Time $t-12$ hours (kWh)')
plt.ylabel('Elec. Usage at Time $t$ (kWh)')
legend(loc="lower right")
fig.savefig('Elec_Lag_12hour.png')
OLS Regression Results ============================================================================== Dep. Variable: USAGE R-squared: 0.768 Model: OLS Adj. R-squared: 0.768 Method: Least Squares F-statistic: 5629. Date: Tue, 13 May 2014 Prob (F-statistic): 0.00 Time: 21:11:57 Log-Likelihood: -770.34 No. Observations: 1701 AIC: 1545. Df Residuals: 1699 BIC: 1556. Df Model: 1 ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ const 0.1644 0.019 8.564 0.000 0.127 0.202 USAGE_t-12 0.8803 0.012 75.026 0.000 0.857 0.903 ============================================================================== Omnibus: 137.460 Durbin-Watson: 0.432 Prob(Omnibus): 0.000 Jarque-Bera (JB): 780.046 Skew: 0.024 Prob(JB): 4.12e-170 Kurtosis: 6.317 Cond. No. 4.45 ==============================================================================
# 24 hour lag
model = sm.OLS(elec_and_weather['USAGE'],sm.add_constant(elec_and_weather['USAGE_t-24']))
res = model.fit()
print res.summary()
fig = plt.figure(figsize=[5,5])
p1= plt.plot(elec_and_weather['USAGE_t-24'],elec_and_weather['USAGE'],'.',color=gray_light,alpha=1.0,markersize=8)
#p2= plt.plot([0,4],[0,4],'k')
# only plotting first and last point in linear fit to display clean line
plot_model = plt.plot([min(elec_and_weather['USAGE_t-24']),max(elec_and_weather['USAGE_t-24'])],[min(res.fittedvalues),max(res.fittedvalues)],'b',label='OLS $R^2$=%.2f' % res.rsquared,linewidth=1,color=red_orange)
plt.xlabel('Elec. Usage at Time $t-24$ hours (kWh)')
plt.ylabel('Elec. Usage at Time $t$ (kWh)')
legend(loc="lower right")
fig.savefig('Elec_Lag_24hour.png')
OLS Regression Results ============================================================================== Dep. Variable: USAGE R-squared: 0.677 Model: OLS Adj. R-squared: 0.677 Method: Least Squares F-statistic: 3560. Date: Tue, 13 May 2014 Prob (F-statistic): 0.00 Time: 21:17:10 Log-Likelihood: -1052.5 No. Observations: 1701 AIC: 2109. Df Residuals: 1699 BIC: 2120. Df Model: 1 ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ const 0.2372 0.023 10.440 0.000 0.193 0.282 USAGE_t-24 0.8248 0.014 59.664 0.000 0.798 0.852 ============================================================================== Omnibus: 125.400 Durbin-Watson: 0.247 Prob(Omnibus): 0.000 Jarque-Bera (JB): 550.153 Skew: 0.191 Prob(JB): 3.43e-120 Kurtosis: 5.760 Cond. No. 4.47 ==============================================================================
# 48 hour lag
model = sm.OLS(elec_and_weather['USAGE'],sm.add_constant(elec_and_weather['USAGE_t-48']))
res = model.fit()
print res.summary()
fig = plt.figure(figsize=[5,5])
p1= plt.plot(elec_and_weather['USAGE_t-48'],elec_and_weather['USAGE'],'.',color=gray_light,alpha=1.0,markersize=8)
#p2= plt.plot([0,4],[0,4],'k')
# only plotting first and last point in linear fit to display clean line
plot_model = plt.plot([min(elec_and_weather['USAGE_t-48']),max(elec_and_weather['USAGE_t-48'])],[min(res.fittedvalues),max(res.fittedvalues)],'b',label='OLS $R^2$=%.2f' % res.rsquared,linewidth=1,color=red_orange)
plt.xlabel('Elec. Usage at Time $t-48$ hours (kWh)')
plt.ylabel('Elec. Usage at Time $t$ (kWh)')
legend(loc="lower right")
fig.savefig('Elec_Lag_48hour.png')
OLS Regression Results ============================================================================== Dep. Variable: USAGE R-squared: 0.485 Model: OLS Adj. R-squared: 0.485 Method: Least Squares F-statistic: 1602. Date: Tue, 13 May 2014 Prob (F-statistic): 2.32e-247 Time: 21:17:02 Log-Likelihood: -1448.5 No. Observations: 1701 AIC: 2901. Df Residuals: 1699 BIC: 2912. Df Model: 1 ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ const 0.4031 0.029 13.880 0.000 0.346 0.460 USAGE_t-48 0.7020 0.018 40.029 0.000 0.668 0.736 ============================================================================== Omnibus: 10.671 Durbin-Watson: 0.151 Prob(Omnibus): 0.005 Jarque-Bera (JB): 13.370 Skew: 0.087 Prob(JB): 0.00125 Kurtosis: 3.398 Cond. No. 4.55 ==============================================================================