Rajan Subramanian
Commodity Quant Research Intern
Energy/Metals Group
This project examines the time series relationship between the Henry Hub Natural Gas price and the West Texas Intermediate (WTI) Crude Oil price. This relationship has been approached in the past using correlation and deterministic trends. Correlation is based on asset returns. Any analysis of correlation typically begins by detrending the price data. De-trending removes long term trend. Hence, it is impossible to base any decision on common trends in prices.
In the past, many studies have established that crude oil and gas prices are cointegrated. Yet recent studies have noted that the two price series have 'decoupled'. The prices have appeared to move independently of each other. Natural gas prices are more volatile (approximately twice as more) than Crude Oil Prices. According to recent literature, the co-integrating relationship doesn't seem stable over time. While rom 1989-2005, the two prices seemed to be trending upwards and co-integrated, after 2005, they two prices seems to have decoupled.
Economic theory suggests that the two commodities are related. They tend to be substitutes in production and consumption. Past changes in crude oil drove changes in natural gas prices, but the converse did not occur. This is because crude oil price is determined in the world market while natural gas market is regionally segmented. Hence, domestic natural gas market is much smaller than global crude oil market and hence less likely to influence the global price of oil.
The following are the economic factors linking Natural Gas and Crude Oil Market:
which increases nat gas demand and hence prices.
2) Increase in crude oil price resulting from increase in crude oil demand may increase nat gas produced as a co-product of oil, which would decrease nat gas prices.
3) Increase in crude oil prices resulting from increase in crude oil demand may lead to increased costs of nat gas production and development, putting upward pressure on nat gas prices
4) Increase in crude oil prices resulting from an increase in crude oil demand may lead to more drilling and development of nat gas projects, which would tend to increase production and decrease nat gas prices.
The purpose of this project is to better characterize the relationship between crude oil and natural gas. Specifically, we will do the following:
The data for daily Crude Oil and Natural Gas prices are obtained from the Bloomberg Terminal. Lets load the necessary libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import statsmodels.graphics.tsaplots as tsaplots
import statsmodels.api as sm
from pylab import rcParams
#some useful R import functions
import rpy2.robjects as robj
import pandas.rpy.common as com
from rpy2.robjects.packages import importr
import scipy.stats as sp
stats=importr('stats')
TSA = importr('TSA')
urca = importr('urca')
VARS = importr('vars')
%pylab inline
import stat2models as st2
%load_ext rpy2.ipython
Populating the interactive namespace from numpy and matplotlib
/usr/local/lib/python2.7/dist-packages/pandas/rpy/__init__.py:11: FutureWarning: The pandas.rpy module is deprecated and will be removed in a future version. We refer to external packages like rpy2. See here for a guide on how to port your code to rpy2: http://pandas.pydata.org/pandas-docs/stable/r_interface.html FutureWarning)
Now, we will import the necessary data which is stored in a excel file. Before we plot, lets munge the data to make the analysis more easier
#read data from excel file
natgas = pd.read_excel('commodity_prices.xlsx',sheetname='nat_gas',index_col=0)
wti = pd.read_excel('commodity_prices.xlsx',sheetname='wti',index_col=0)
Plot of WTI Crude Oil and Natural Gas Raw Prices
rcParams['figure.figsize'] = 12,4
axe = wti.plot(color='black')
we = natgas.plot(secondary_y=True,style='orange',ax=axe)
we.legend(loc='best')
axe.set_ylabel('Price (Dollars Per Barrel)')
axe.right_ax.set_ylabel('Price (Dollars per MMBtuu')
axe.set_title('Daily WTI/Nat Gas Prices')
axe.legend(loc='upper left')
<matplotlib.legend.Legend at 0x7fde1ecb1c50>
From the figure at the top, we can see that Nat Gas is more volatile than WTI. It seems that from just looking at the graphs, the correlation between WTI/Nat gas is high and positive. Since the data period includes daily prices, there might be too much noise in the daily data. Lets resample monthly. Monthly is used because they contain enough information to capture short-run movements over time. Hence, we will choose the first price of a particular month for the commodity. Also, log transformations is done for both the WTI and Nat Gas. Log Transformation is done because 1)Generally in Geometric Brownian Motion, it is:
$log(S_t / S_{t-1})$ ~ $ N(\mu t,\sigma^2 t) $
that is, it is the log returns that are normally distributed. Also, 2) log returns are easier to work with since h period log return is sum of h-consequtive one period returns. This leads to square-root of time rule. 3) Log Transformations can remove scale effects and reduce possible effects of Heteroskedasticity.
Finally, as Crude Prices are generally higher than Nat Gas price by atleast a factor of 2, we will scale down the log price of WTI by subtracting the mean of log price of Nat Gas. This allows direct comparison of the two prices. That is:
$log(WTI_t) = log(WTI_t) - mean(log(Nat Gas_t))$
#resampling to monthly frequencies
#wti = wti.resample('W-MON',how='mean')
#natgas = natgas.resample('W-MON',how='mean')
#subtracting mean of log(Natgasprice) from log(wti)
Gt_wti = np.log(wti['wti_price'])-np.mean(np.log(natgas['ng1_price']))
xt_wti = Gt_wti.diff().dropna() #stores the log(returns of wti)
Gt_natgas = np.log(natgas['ng1_price'])
xt_natgas = Gt_natgas.diff().dropna() #stores the log(returns of nat gas)
#plotting the data
rcParams['figure.figsize'] = 12,4
plt.plot(Gt_wti.index,Gt_wti.values,color='black',label='WTI Crude Oil Log(Price)')
plt.plot(Gt_natgas.index,Gt_natgas.values,color='orange',label='HenryHub Natural Gas Price (log Price)')
plt.legend(loc='best')
plt.ylabel('log price (Dollars)')
plt.xlabel('Date')
plt.title('Daily Nat Gas Log Prices & Mean-Adjusted log(WTI) 01/29 /1998-5/20/2015')
<matplotlib.text.Text at 0x7fde1f1e6e10>
From visual inspection, we can see that nat gas prices are more volatile than crude oil price.
It seems like the two commodity prices were moving together since 1998 till mid 2008. After 2008, there is a decoupling effect seen. They both seem to be positively correlated and it seems the price of Henry Hub lags the price of WTI Crude Oil. The two prices were trending upwards since early 2000s, however, there is a decoupling effect seen around 2001. Although WTI was more stable around this period, nat gas futher declined. There seems to be a steady increase in crude oil price since 2003, however, nat gas prices were more variable around this same period. Also, since 2011, it seems WTI stabalized around $4.5 whereas nat gas prices were more variable during the same period. There has been a steady decline in prices for both the commodities since mid 2014.
WTI has been trending upwards since 1998 and mean doesn't appear to be constant. This is an indication of a stochastic trend (a time series has a stochastic trend if it is currently non stationary and first difference makes it stationary). Hence, WTI is not stationary.
Natural Gas doesn't seem to wander into any particular direction. However, it seems that since the early 1999, there was a general upward trend, but after 2005, the trend has been going downwards.
from scipy.stats import norm
(mu,sigma) = norm.fit(xt_wti.values)
n,bins,patches = plt.hist(xt_wti.values,60,normed=1,color='darkkhaki',alpha=0.75,histtype='bar')
import matplotlib.mlab as mlab
y = mlab.normpdf(bins,mu,sigma)
l = plt.plot(bins,y,'blue',linewidth=2,label='Normal Density')
sns.kdeplot(xt_wti.values,color='r',label='Empirical Density')
plt.ylabel('Density')
plt.title('WTI Daily Returns')
<matplotlib.text.Text at 0x7fde1c7f43d0>
(mu,sigma) = norm.fit(xt_natgas.values)
n,bins,patches = plt.hist(xt_natgas.values,60,normed=1,color='darkkhaki',alpha=0.75,histtype='bar')
y = mlab.normpdf(bins,mu,sigma)
l = plt.plot(bins,y,'blue',linewidth=2,label='Normal Density')
sns.kdeplot(xt_natgas.values,color='r',label='Empirical Density')
plt.ylabel('Density')
plt.title('NatGas Daily Returns')
<matplotlib.text.Text at 0x7fde1ca5b4d0>
pd.rolling_corr(xt_natgas,xt_wti,window=255).plot(title='WTI/NatGas Return Rolling Yearly Correlation',color='firebrick')
<matplotlib.axes.AxesSubplot at 0x7fde1c629b10>
skew = pd.rolling_skew(xt_wti,window=255).dropna()
n,bins,patches = plt.hist(skew.values,bins=30,normed=1,color='cadetblue')
plt.title('Histogram of Yearly WTI Return Skews')
<matplotlib.text.Text at 0x7fde1cb78a10>
skew.plot(title='Rolling Yearly Skewness of WTI Returns',color='firebrick')
<matplotlib.axes.AxesSubplot at 0x7fde1cac4110>
skew = pd.rolling_skew(xt_natgas,window=255).dropna()
n,bins,patches = plt.hist(skew.values,bins=30,normed=1,color='cadetblue')
plt.title('Histogram of Yearly NatGas Return Skews')
<matplotlib.text.Text at 0x7fde17d58910>
skew.plot(title='Rolling Yearly Skewness of NatGas Returns',color='firebrick')
<matplotlib.axes.AxesSubplot at 0x7fde17c4e790>
Now, plot of the original series and the corresponding ACF, the difference series and the corresponding PACF for both commodities is shown next.
rcParams['figure.figsize'] = 13,8
fig,axes = plt.subplots(2,2)
axes[0,0].plot(Gt_wti.index,Gt_wti.values,color='black')
axes[0,0].set_title('log(wti price)')
tsaplots.plot_acf(Gt_wti.values,ax=axes[0,1],lags=24);
axes[1,0].plot(xt_wti.index,xt_wti.values,color='black')
axes[1,0].set_title('log(wti returns)')
tsaplots.plot_pacf(xt_wti.values,ax=axes[1,1],lags=24);
axes[0,1].set_title('ACF log(wti price)')
axes[1,1].set_title('PACF log(wti return)')
<matplotlib.text.Text at 0x7fde17990a10>
The top left figure shows the time series plot of Log(WTI) sampled at monthly intervals. From visual inspection, it looks like the series is trending upwards. There seemed to be a drastic drop at the beginning of 2008-2009. The price series then reverted back to trending upwards after that. However, since 2014, the prices were dropping again. The ACF of the Log(wti) series is plotted on the top right graph. The ACF graph shows that it is high and positive, close to 1 and slowly decaying. This indicates that the root parameter of an AR process is close to 1. Hence, the series is clearly non-stationary.
The plot of the first difference and the corresponding PACF is shown at the bottom. From the graph of the first difference, we can see that the series looks stationary and seems to wander around a fixed level. On the basis of sample pacf, we can see that the series drops down to 0 very quicly. This is a property of a stationary series. Similar property to white noise !. Looks like an AR(1) model would fit the data very well.
Hence, the first level of the series is non stationary and differencing renders it stationary. Hence, the original model is an I(1) series. Below contains similar plots for nat_gas prices:
rcParams['figure.figsize'] = 13,8
fig,axes = plt.subplots(2,2)
axes[0,0].plot(Gt_natgas.index,Gt_natgas.values,color='orange')
axes[0,0].set_title('log(NatGas price)')
tsaplots.plot_acf(Gt_natgas.values,ax=axes[0,1],lags=24);
axes[1,0].plot(xt_natgas.index,xt_natgas.values,color='orange')
axes[1,0].set_title('log(NatGas returns)')
tsaplots.plot_pacf(xt_natgas.values,ax=axes[1,1],lags=24);
axes[0,1].set_title('ACF log(NatGas price)')
axes[1,1].set_title('PACF log(NatGas return)')
<matplotlib.text.Text at 0x7fde176fe4d0>
The top left figure shows the time series plot of Log(Natural Gas) sampled at monthly intervals. From visual inspection, we can see theres seasonality in this data as there are many sequences of up and down patterns. Also, its hard to say if there was any particular trend. From 1998-2005, we can see an upward trend. However, after 2008, it seems the trend has been generally downwards. There seemed to be a drastic drop at the beginning of 2008-2009. Since 2014, the prices were dropping again. The ACF of the Log(Nat Gas) series is plotted on the top right graph. The ACF graph shows that it is high and positive and decaying slowly. This indicates there is some long memory in price series.
The plot of the first difference and the corresponding PACF is shown at the bottom. From the graph of the first difference, we can see that the series looks stationary and seems to wander around a fixed level. On the basis of sample pacf, we can see that the AR(9) model might fit this data to render it stationary.
Lets run some statistical testing to see if the model is stationary.
#run the one sample t-test to check if sample mean is 0
print("Log(WTI) Returns")
st2.print_ttest(xt_wti.values)
Log(WTI) Returns one sample t test t = 0.780488476029 p value = 0.435145797782 alternative hypothesis: true mean is not equal to 0 95% confidence interval (-0.00043180311488037808, 0.0010033644803377499) sample estimate mean of data 0.000285780682729
print('\n')
print("Log(NatGas) Returns")
#run the one sample t-test for nat gas
st2.print_ttest(xt_natgas.values)
Log(NatGas) Returns one sample t test t = 0.119944514386 p value = 0.904532630531 alternative hypothesis: true mean is not equal to 0 95% confidence interval (-0.00097184129627887685, 0.0010985562548650044) sample estimate mean of data 6.33574792931e-05
Sample mean of the log(returns) is 0 since the p-value is not significant. We accept the null that the sample mean of log returns is 0. Hence we accept the null. We will specify a model with no constant:
r_df_wti = com.convert_to_r_dataframe(pd.DataFrame(xt_wti))
r_df_ng = com.convert_to_r_dataframe(pd.DataFrame(xt_natgas))
%Rpush r_df_wti
%Rpush r_df_ng
%R y_wti_ret = ts(r_df_wti);
%R y_ng_ret = ts(r_df_ng);
%R cat('wti order select', ar(y_wti_ret,dmean=FALSE)$order)
%R cat('ng1 order select', ar(y_ng_ret,demean=FALSE)$order)
wti order select 6
ng1 order select 1
We can see that the automatic selection picks an AR(6) for the log (returns) of wti and AR(1) for logNatGas. This confirms with the initial observation of the pacf graph that the order of the model is of AR(1) & AR(9) respectively. Lets check the residuals of the model:
rcParams['figure.figsize'] = 11,8
fig,axes = plt.subplots(2,1)
model_wti1 = sm.tsa.ARMA(xt_wti.values,order=(9,0)).fit(trend='nc') #based on PACF criteria. AIC leads to corr at lag 13
model_natgas = sm.tsa.ARMA(xt_natgas.values,order=(10,0)).fit(trend='nc') #based on PACF only
tsaplots.plot_acf(model_wti1.resid,lags=24,ax=axes[0]);
tsaplots.plot_acf(model_natgas.resid,lags=24,ax=axes[1]);
axes[0].set_title('ACF log(WTI returns) residuals')
axes[1].set_title('ACF log(NatGas Returns) residuals ')
<matplotlib.text.Text at 0x7fde1557b3d0>
We can see that the two models for wti and natgas fits pretty well. The ACFs of the residuals don't show any significant lags at 5% significance. Although in the wti, there seems to be a minor significance around lag 13. But this is to be expected since in a sample size of 100, there is a 5% chance that our residuals will be correlated.
#ljung box test on the residuals
print "ljung box test on wti residuals \n"
st2.boxTest(model_wti1.resid)
print '\n'
print "ljung box test on nat gas residuals \n"
st2.boxTest(model_natgas.resid)
ljung box test on wti residuals Xsquared 0.0440559665223 pvalue 0.999999999958 Xsquared 17.3472495506 pvalue 0.630323779628 ljung box test on nat gas residuals Xsquared 0.0201591143789 pvalue 0.999999999999 Xsquared 19.7019525422 pvalue 0.476707367819
We perform two forms of ADF test due to viual observation of the data. Each form differs based on deterministic component of the trend:
$\delta x_t = \beta_0 + \phi_1 x_{t-1} + \sum\limits_{i=0}^{p}(\gamma_i \delta x_{t-i}) + \epsilon_t$
$\delta x_t = \beta_0 + \beta_1 t + \phi_1 x_{t-1} + \sum\limits_{i=0}^{p}(\gamma_i \delta x_{t-i}) + \epsilon_t$
Following Elder/Kennedy's methodology to determine if there is a trend or intercept in the data:
Assume our regression is of the form:
$\delta x_t = \beta_0 + \beta_1 t + \phi_1 x_{t-1} + \sum\limits_{i=0}^{p}(\gamma_i \delta x_{t-i}) + a_t$
Ignore the fact that $\beta_1$ is 0. We will conduct the ADF to test for stationarity.
If the data is not stationary, $\beta_1$ = 0 would follow.
Since there is unit root, we test presence of intercept by regressing $\delta x_t$ against the remaining variables on the rhs of above equation
st2.print_adfuller(Gt_wti.values,title='Log(WTI)-based on PACF',maxlag=9,regression='ct') # Based on PACF observation
print('\n')
st2.print_adfuller(Gt_wti.values,title='Log(WTI)-based on t-stat',regression='ct',autolag='t-stat') # Based on t-stat
print('\n')
st2.print_adfuller(Gt_wti.values,title='Log(WTI)-based on AIC',regression='ct',autolag='AIC') # Based on t-stat
print('\n')
st2.print_adfuller(xt_wti.values,title='Log(WTI) Returns',maxlag=9,regression='nc') #checking if returns are stationary
print('\n')
st2.print_adfuller(Gt_natgas.values,title='Log(NatGas) based on PACF',maxlag=10,regression='ct') #Based on PACF observation
print('\n')
st2.print_adfuller(Gt_natgas.values,title='Log(NatGas) based on t-stat',regression='ct',autolag='t-stat')
print('\n')
st2.print_adfuller(Gt_natgas.values,title='Log(NatGas) based on AIC',regression='ct',autolag='AIC')
print('\n')
st2.print_adfuller(xt_natgas.values,title='Log(WTI) Returns',maxlag=10,regression='nc')
Augmented Dickey Fuller Test commodity : Log(WTI)-based on PACF Dickey-Fuller = -1.63181410507 Lag order = 9 p value = 0.779727778289 alternative hypothesis: stationary Augmented Dickey Fuller Test commodity : Log(WTI)-based on t-stat Dickey-Fuller = -2.05244956839 Lag order = 27 p value = 0.572598782289 alternative hypothesis: stationary Augmented Dickey Fuller Test commodity : Log(WTI)-based on AIC Dickey-Fuller = -1.75937402162 Lag order = 6 p value = 0.724010223777 alternative hypothesis: stationary Augmented Dickey Fuller Test commodity : Log(WTI) Returns Dickey-Fuller = -22.2735916248 Lag order = 9 p value = 0.0 alternative hypothesis: stationary Augmented Dickey Fuller Test commodity : Log(NatGas) based on PACF Dickey-Fuller = -2.46585523458 Lag order = 10 p value = 0.345238275513 alternative hypothesis: stationary Augmented Dickey Fuller Test commodity : Log(NatGas) based on t-stat Dickey-Fuller = -2.30604490265 Lag order = 27 p value = 0.430609222673 alternative hypothesis: stationary Augmented Dickey Fuller Test commodity : Log(NatGas) based on AIC Dickey-Fuller = -2.46008646895 Lag order = 1 p value = 0.348196213139 alternative hypothesis: stationary Augmented Dickey Fuller Test commodity : Log(WTI) Returns Dickey-Fuller = -20.1005645203 Lag order = 10 p value = 0.0 alternative hypothesis: stationary
Hence we see that presence of unit root is not rejected. i.e, we accept that the series is not stationary. Our testing of stationarity is done. Hence $\beta_1$ = 0. Next step, we test presence of intercept by regressing change in series against intercept:
$\delta x_t = \beta_0 + \sum\limits_{i=0}^{p}(\gamma_i \delta x_{t-i}) + a_t$
Notice...this is just a traditional AR model on the differences...Hence, we fit an AR(6) model for wti and AR(1) for nat_gas to determine the t statistic on the coefficient. We can run this regression because the log(returns) are stationary
m1 = st2.ar_ols(xt_wti.values,lags=9)
m2 = st2.ar_ols(xt_natgas.values,lags=10)
print 'wti_const = ', m1.params[0]
print 'wti_p value = ', m1.pvalues[0]
print '\n'
print 'natgas_const = ', m2.params[0]
print 'natgas_p value = ', m2.pvalues[0]
wti_const = 0.000332944049915 wti_p value = 0.363800928384 natgas_const = 7.92473967396e-05 natgas_p value = 0.880670352897
Hence we accept that the drift is 0 and both of them have no trend. Hence, both series are I(1) with no drift or trending terms
#finally, using phillips perrons test to confirm that the series is stationary:
from arch.unitroot import PhillipsPerron
print "wti prices"
print(PhillipsPerron(Gt_wti,trend='nc',lags=9))
print '\n'
print 'nat gas prices'
print(PhillipsPerron(Gt_natgas,trend='nc',lags=10))
wti prices Phillips-Perron Test (Z-tau) ===================================== Test Statistic 0.360 P-value 0.791 Lags 9 ------------------------------------- Trend: No Trend Critical Values: -2.57 (1%), -1.94 (5%), -1.62 (10%) Null Hypothesis: The process contains a unit root. Alternative Hypothesis: The process is weakly stationary. nat gas prices Phillips-Perron Test (Z-tau) ===================================== Test Statistic -0.619 P-value 0.447 Lags 10 ------------------------------------- Trend: No Trend Critical Values: -2.57 (1%), -1.94 (5%), -1.62 (10%) Null Hypothesis: The process contains a unit root. Alternative Hypothesis: The process is weakly stationary.
The Phillips Perron test also confirm that the original series is I(1) for both the series
two time Series $X_t$ and $Y_t$ are co-integrated if each is I(1) and there exists a $\lambda$ such that:
$Z = X_t -\lambda Y_t $ is stationary.
The process $Z_t$ is called disequilibrium because it captures deviations from long term equilibrium.
Generally, price data is detrended even before any analysis even begins. Hence, long term trend is generally removed. Hence, it impossible to base any decision on common stochastic trends. In co-integration, we have to first test if theres a common stochastic trends in variables.
If there is a common stochsic trend in set of prices, they must have long term equilibrium relationship. Second goal of cointegration analysis is to capture this equilibrium in a dynamic correlation analysis. Hence, we have a two set procedure:
a) long term equilibrium relationship between prices is established. Statistical tests for cointegration is applied, and if cointegration is present, we identifuy stationary linear combination of prices which best describes long term equilibrum relationship between them.
b) long term equilibrium is used in error correction model (ECM) of returns. ECMs explain how short term deviations from equilibrium are corrected.
We can see from the very first picture on the top, that the two series seems co-integrated. Following is a scatter plot of the two integrated Series and a regression model regressing Log(nat gas price) against log(crude oil price) :
$log(natgas_t) = \beta_0 + \beta_1 log(wti_t) + \epsilon_t$
The above regression model determines whether the two series are co-integrated because once we run the regression and fit the model, then we subtract the log of WTI from log of nat gas, that is:
$log(natgas_t) - \beta_1 log(wti_t) - \beta_0 = \epsilon_t$
Hence, $ \beta_1 $ determines the parameter needed to form a stationarity between the two series. Hence, from the rhs of the equation, an observation of the residuals would indicate whether the series is stationary or not.
Engle Granger propose applying unit root testing on the residuals from the static regression above. Unit root test regression on the residuals is without the deterministic terms (constant + trend). According to Phillips and Ouliaris, ADF and Phillips Perrons Unit tests as applied to the residuals do not follow the usual Dickey Fuller Distribution. But, due to spurious regression, distribution of ADF and PP test has asymptotic distirbutions that are a function of Weirner Process that depend on the deterministic terms and number of variables used as the dependent variable. Hence, we use the Phillips Ouliaris test on the residuals to determine if the two series are co-integrating or not. Also, due to Hansen (1992), PO distributions of the ADF and PP test applied to residuals depend on the trend behavior as follows:
if two series are both I(1) without drift, then use regression with just a constant with dimension parameter n-1
if independent variable is I(1) with drift while the dependent one may or maynot be I(1) with drift, use regression with constant plus trend in PO test with dimension parameter n-2. If n-2 = 0, then use DF distribution adjusted for constant and trend
Dependent is I(1) with drift and indepdent is I(1) without drift. Use constnat plus trend in regression. Use PO test with dimension n-1
Since we already know that both our series are I(1) without the drift, we will use the fist criteria and just specify the constant:
prices = ['log(natgas)','log(wti)'] #empirically, wti causes nat gas
Gt_data = pd.DataFrame(np.column_stack([Gt_natgas.values,Gt_wti.values]),index=Gt_natgas.index,columns=prices)
xt_data = Gt_data.diff().dropna()
returns = ['log(natgas)return','log(wti)return']
xt_data.columns = returns
r_df = com.convert_to_r_dataframe(Gt_data)
%Rpush r_df
%R y = ts(r_df)
%R print(summary(ca.po(y,lag='long',type='Pz',demean='constant')))
######################################## # Phillips and Ouliaris Unit Root Test # ######################################## Test of type Pz detrending of series with constant only Response log.natgas. : Call: lm(formula = log.natgas. ~ zr) Residuals: Min 1Q Median 3Q Max -0.19643 -0.01977 -0.00053 0.01854 0.32584 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.0040358 0.0022919 1.761 0.0783 . zrlog.natgas. 0.9965988 0.0013057 763.244 <2e-16 *** zrlog.wti. 0.0004305 0.0009415 0.457 0.6475 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.03485 on 4356 degrees of freedom Multiple R-squared: 0.9941, Adjusted R-squared: 0.9941 F-statistic: 3.658e+05 on 2 and 4356 DF, p-value: < 2.2e-16 Response log.wti. : Call: lm(formula = log.wti. ~ zr) Residuals: Min 1Q Median 3Q Max -0.166412 -0.012778 0.000534 0.013249 0.163391 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.871e-03 1.589e-03 1.806 0.0709 . zrlog.natgas. 2.792e-05 9.055e-04 0.031 0.9754 zrlog.wti. 9.989e-01 6.530e-04 1529.824 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.02417 on 4356 degrees of freedom Multiple R-squared: 0.9985, Adjusted R-squared: 0.9985 F-statistic: 1.469e+06 on 2 and 4356 DF, p-value: < 2.2e-16 Value of test-statistic is: 24.5533 Critical values of Pz are: 10pct 5pct 1pct critical values 47.5877 55.2202 71.9273
The two series is not co-integrated according to the Phillips Ouliaris test. The PO test gives a test statistic of 24.5729 which is less than the 10% critical value of 44.8877. Hence, we accept that the two series are not co-integrated, at least in this data period.
Next we plot the residuals from our static regression and also show the acf and pacf plots:
rcParams['figure.figsize'] = 14,3
fig,axes = plt.subplots(1,3)
n = len(Gt_natgas)
t = np.arange(0,n)
X = np.column_stack((np.ones(n),Gt_wti.values))
res = sm.OLS(Gt_natgas.values,X).fit() # includes a constant: y(t) = a + bx(t) + e_t
axes[0].plot(res.resid,color='firebrick')
axes[0].set_title('residual plot')
tsaplots.plot_acf(res.resid,lags=24,ax=axes[1]);
axes[1].set_title('ACF residuals')
tsaplots.plot_pacf(res.resid,lags=24,ax=axes[2]);
axes[2].set_title('PACF residuals')
<matplotlib.text.Text at 0x7fde15142310>
The above graph plots the residual plot from the co-integrating regression (with a constant) and includes the ACF and PACF graph respectively. We can see a slowly declining ACF for the residual and a sudden decline in PACF. From the plots, it doesn't look like the residual series is stationary.
from statsmodels.stats.stattools import durbin_watson
print "Rsquared: ", res.rsquared
print "DW statistic: ", durbin_watson(res.resid)
Rsquared: 0.203251952962 DW statistic: 0.00696216809669
We notice a small DW statistic and low Rsquared. ACF plot of residuals shows a slowly declining correlation. Phillips Ouliaris test shows that the data is not cointegrated. Hence, the parameters obtained for the static regression model is spurious or non sense.
This is in essence, the Engle-Granger Methodology. The methodology says that:
1)Test inidividual variables $X_{t}$ and $Y_{t}$ for unit roots
2)Run the static cointegrating regression:
$Y_{t} = \beta_0 + \beta_1 X_{t} + \epsilon_t $
Test for no-cointegration by testing for unit root in residuals, $\epsilon_t $. Unit root regression for residuals is without the deterministic terms (constant or constant and trend)
If co-integration is not rejected, check for a dynamic ECM model.
We apply the OLS regression of one integrated variable on another integrated variable and apply Engle Granger test on the residuals. However, Phillips and Ouliaris show that ADF root tests applied to estimated cointegrating residual do not have the usual Dickey Fuller distribution under null of no cointegration. These distributions depend on the Phillips-Ouliaris (PO) distributions. We can also confirm using the Phillips-Ouliaris co-integration (which is an implementation of Engle-Granger Methodology and an improvement over EG methodology for co-integration) test. Finally, before we proceed to these testing...we will cover Johansen Procedure which is another form of co-integration testing...
The basic steps are the following:
Specify and Estimate a VAR(p) model for $Y_t$
Construct likelihood ratio tests for rank of coefficient matrix to determine number of co-integrating vector
Impose normalization and identify restrictions on co-integrating vectors
Given cointegrating vector, estimate the cointegrated VECM using Maximum likelihood
Again, there are special cases that depends on whether the data has a restricted constant or trending. Refer to Eric Zivot's chapter 12 -Cointegration
%R print(summary(ca.jo(y,type='eigen',ecdet='const')))
%R print(summary(ca.jo(y,type='trace',ecdet='const')))
###################### # Johansen-Procedure # ###################### Test type: maximal eigenvalue statistic (lambda max) , without linear trend and constant in cointegration Eigenvalues (lambda): [1] 1.568201e-03 8.862071e-04 2.168404e-19 Values of teststatistic and critical values of test: test 10pct 5pct 1pct r <= 1 | 3.86 7.52 9.24 12.97 r = 0 | 6.84 13.75 15.67 20.20 Eigenvectors, normalised to first column: (These are the cointegration relations) log.natgas..l2 log.wti..l2 constant log.natgas..l2 1.0000000 1.000000 1.000000 log.wti..l2 -0.1592085 -2.656973 -4.149159 constant -1.1045779 5.654689 3.126602 Weights W: (This is the loading matrix) log.natgas..l2 log.wti..l2 constant log.natgas..d -0.003289303 0.0000507866 1.792455e-19 log.wti..d -0.000399533 0.0004266265 -7.171711e-20
###################### # Johansen-Procedure # ###################### Test type: trace statistic , without linear trend and constant in cointegration Eigenvalues (lambda): [1] 1.568201e-03 8.862071e-04 2.168404e-19 Values of teststatistic and critical values of test: test 10pct 5pct 1pct r <= 1 | 3.86 7.52 9.24 12.97 r = 0 | 10.70 17.85 19.96 24.60 Eigenvectors, normalised to first column: (These are the cointegration relations) log.natgas..l2 log.wti..l2 constant log.natgas..l2 1.0000000 1.000000 1.000000 log.wti..l2 -0.1592085 -2.656973 -4.149159 constant -1.1045779 5.654689 3.126602 Weights W: (This is the loading matrix) log.natgas..l2 log.wti..l2 constant log.natgas..d -0.003289303 0.0000507866 1.792455e-19 log.wti..d -0.000399533 0.0004266265 -7.171711e-20
There is no co-integration between 1998-2015. You cannot apply ADF test on the residuals. Testing stationarity of residuals follows the phillipps-Ouliaris distribution. According to the po test above, it has the benefit that it doesn't matter what the dependent variable in the regression equation is. We can clearly see that, including a constant in our regression stills leads to no co-integration. We can conclude, that for period 1998-2015, Crude WTI/NatGas is not co-integrated.
We can see that, it seems structural breaks in the data might affect the outcome of the test. It looks like from 2010, there was a structural break in the data where the two prices have diverged. Perhaps more time needs to pass to see whether the relationship eventually co-integrates.
Also, some of these tests that perform ADF tests on the residuals are too sensitive to the tests at hand. For example, recall that an AR(1) is stationary if and only if $ \beta_1 $ is $<= |1|$:
$x_t$ = $\beta_0 + \beta_1 x_{t-1} + \epsilon_t$. Subtracting we find..
$\delta x_t $ = $\beta_0 + x_{t-1}\pi + \epsilon_t $
where $\pi$ = ($\beta_1 - 1$)
Hence, testing for unit root is analogous to testing for $\pi$ = 0. If the number is very close to 0, i.e 0.000056 etc..there is nothing to distinguish it from 0 and not 0. Bottom line is if roots are close to unity, we might have tests that fail and others that pass
According to literature, Crude Oil WTI and Natgas were co-integrated in the past. Looking at the figure, we can see that a possible relationship might have existed from 1998-2009. After 2009, it looks like the spread between the two commodities have increased. Lets run our tests on these dates:
1998-2006
1998-01/2009
2009-current
def filter_data(wti,natgas,start,end):
#return the wti and natgas log prices corresponding to the start and end dates
x = np.log(wti.ix[start:end,'wti_price']) - np.mean(np.log(natgas.ix[start:end,'ng1_price']))
y = np.log(natgas.ix[start:end,'ng1_price'])
return x,y
plt.plot(Gt_data.values)
plt.plot((3045,3045),(0,4),'r-')
plt.plot((2620,2620),(0,4),color='orange')
plt.plot((2760,2760),(0,4),'b-')
print(Gt_data.index[3045])
print(Gt_data.index[2620])
print(Gt_data.index[2760])
2010-03-04 00:00:00 2008-06-25 00:00:00 2009-01-14 00:00:00
Red line marks the end of co-integrating relationship (done through trial and error) Blue line indicates the beginning of decoupling effect Orange line marks the beginning of financial crisis
crude,ng1 = filter_data(wti,natgas,start='1998',end='03-04-2010') #filter the time period. From 06-29-2009: no co-integration
fig,ax = plt.subplots(1,2)
rcParams['figure.figsize'] = 12,5
tsaplots.plot_pacf(crude.diff().dropna(),lags=24,ax=ax[0]); #plot the pacf for the two commodites
tsaplots.plot_pacf(ng1.diff().dropna(),lags=24,ax=ax[1]);
from arch.unitroot import ADF
#based on Enders/kennedy method to determine if data has a drift + trend.
print(ADF(crude.values,trend='ct',lags=9)) #based on pacf
print '\n'
print(ADF(ng1.values,trend='ct',lags=5))
Augmented Dickey-Fuller Results ===================================== Test Statistic -2.402 P-value 0.378 Lags 9 ------------------------------------- Trend: Constant and Linear Time Trend Critical Values: -3.96 (1%), -3.41 (5%), -3.13 (10%) Null Hypothesis: The process contains a unit root. Alternative Hypothesis: The process is weakly stationary. Augmented Dickey-Fuller Results ===================================== Test Statistic -2.379 P-value 0.391 Lags 5 ------------------------------------- Trend: Constant and Linear Time Trend Critical Values: -3.96 (1%), -3.41 (5%), -3.13 (10%) Null Hypothesis: The process contains a unit root. Alternative Hypothesis: The process is weakly stationary.
#we accept the data is integrated. Hence, beta_1 = 0. Now we regress difference with just constant:
m1 = st2.ar_ols(crude.diff().dropna(),9)
m2 = st2.ar_ols(ng1.diff().dropna(),5)
print 'wti_const = ', m1.params[0]
print 'wti_p value = ', m1.pvalues[0]
print '\n'
print 'natgas_const = ', m2.params[0]
print 'natgas_p value = ', m2.pvalues[0]
wti_const = 0.000612007260768 wti_p value = 0.200041037015 natgas_const = 0.000293656599387 natgas_p value = 0.666964645193
#accept that there is no intercept. Next we do the ljung box test to check validity of the model:
st2.boxTest(m1.resid)
Xsquared 0.0742197904134 pvalue 0.999999999431 Xsquared 13.6090348936 pvalue 0.849752840782
st2.boxTest(m2.resid)
Xsquared 6.39847540417 pvalue 0.780748257887 Xsquared 21.839151329 pvalue 0.349302150232
Hence, no trend and no constant in our model. We accept that the data is integrated during this time period as well. Ljung Box test on the residuals don't show any correlation for both the shorter and longer lags. We accept there is no correlation in the residuals. We confirm using Phillips Perrons Unit test:
print(PhillipsPerron(crude.values,lags=9,trend='nc'))
print '\n'
print(PhillipsPerron(ng1.values,lags=5,trend='nc'))
Phillips-Perron Test (Z-tau) ===================================== Test Statistic 0.783 P-value 0.882 Lags 9 ------------------------------------- Trend: No Trend Critical Values: -2.57 (1%), -1.94 (5%), -1.62 (10%) Null Hypothesis: The process contains a unit root. Alternative Hypothesis: The process is weakly stationary. Phillips-Perron Test (Z-tau) ===================================== Test Statistic -0.346 P-value 0.558 Lags 5 ------------------------------------- Trend: No Trend Critical Values: -2.57 (1%), -1.94 (5%), -1.62 (10%) Null Hypothesis: The process contains a unit root. Alternative Hypothesis: The process is weakly stationary.
#Next we perform Phillips Ouliaris test and Johansen's test:
df_pre = pd.DataFrame(np.column_stack([ng1.values,crude.values]),index=ng1.index,columns=prices)
r_df = com.convert_to_r_dataframe(df_pre)
%Rpush r_df
%R y.pre = ts(r_df)
%R print(summary(ca.po(y.pre,lag='long',type='Pz',demean='constant'))) #phillps ouliaris test
######################################## # Phillips and Ouliaris Unit Root Test # ######################################## Test of type Pz detrending of series with constant only Response log.natgas. : Call: lm(formula = log.natgas. ~ zr) Residuals: Min 1Q Median 3Q Max -0.19426 -0.02135 -0.00027 0.01956 0.32696 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.002412 0.002534 0.952 0.3413 zrlog.natgas. 0.992650 0.002352 422.062 <2e-16 *** zrlog.wti. 0.004392 0.002015 2.180 0.0294 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.0376 on 3042 degrees of freedom Multiple R-squared: 0.9945, Adjusted R-squared: 0.9945 F-statistic: 2.73e+05 on 2 and 3042 DF, p-value: < 2.2e-16 Response log.wti. : Call: lm(formula = log.wti. ~ zr) Residuals: Min 1Q Median 3Q Max -0.166191 -0.014278 0.000604 0.015016 0.163356 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.0027453 0.0017716 1.55 0.121 zrlog.natgas. 0.0003775 0.0016440 0.23 0.818 zrlog.wti. 0.9986609 0.0014086 708.99 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.02629 on 3042 degrees of freedom Multiple R-squared: 0.998, Adjusted R-squared: 0.998 F-statistic: 7.642e+05 on 2 and 3042 DF, p-value: < 2.2e-16 Value of test-statistic is: 49.7624 Critical values of Pz are: 10pct 5pct 1pct critical values 47.5877 55.2202 71.9273
Phillips Ouliaris test gives a test statistic of 49.1058. The 10% and 5 % critical values are 47.5877 and 55.2202 respectively. Hence, the PO test rejects the null of no cointegration. Hence, we can be 90% confident, that the two series were co-integrated in this time period. However, the test fails at the 5% and 1% significance.
#Johansen test:
%R print(summary(ca.jo(y.pre,type='eigen',ecdet='const')))
%R print(summary(ca.jo(y.pre,type='trace',ecdet='const')))
###################### # Johansen-Procedure # ###################### Test type: maximal eigenvalue statistic (lambda max) , without linear trend and constant in cointegration Eigenvalues (lambda): [1] 3.571009e-03 9.644994e-04 1.734723e-18 Values of teststatistic and critical values of test: test 10pct 5pct 1pct r <= 1 | 2.94 7.52 9.24 12.97 r = 0 | 10.89 13.75 15.67 20.20 Eigenvectors, normalised to first column: (These are the cointegration relations) log.natgas..l2 log.wti..l2 constant log.natgas..l2 1.0000000 1.000000 1.00000 log.wti..l2 -0.6692787 -5.942049 -18.28086 constant -0.1539628 13.531631 24.46925 Weights W: (This is the loading matrix) log.natgas..l2 log.wti..l2 constant log.natgas..d -0.0073773112 8.752692e-05 -3.197957e-19 log.wti..d 0.0001103594 2.026133e-04 -1.094570e-20
###################### # Johansen-Procedure # ###################### Test type: trace statistic , without linear trend and constant in cointegration Eigenvalues (lambda): [1] 3.571009e-03 9.644994e-04 1.734723e-18 Values of teststatistic and critical values of test: test 10pct 5pct 1pct r <= 1 | 2.94 7.52 9.24 12.97 r = 0 | 13.83 17.85 19.96 24.60 Eigenvectors, normalised to first column: (These are the cointegration relations) log.natgas..l2 log.wti..l2 constant log.natgas..l2 1.0000000 1.000000 1.00000 log.wti..l2 -0.6692787 -5.942049 -18.28086 constant -0.1539628 13.531631 24.46925 Weights W: (This is the loading matrix) log.natgas..l2 log.wti..l2 constant log.natgas..d -0.0073773112 8.752692e-05 -3.197957e-19 log.wti..d 0.0001103594 2.026133e-04 -1.094570e-20
Johansen's test gives a value of 9.41 and hence fails to reject no cointegration.
Finally, Johansen's trace fails to accept that the data is cointegrated.
Quite Possible that during this time, there is a possibility of cointegration or if it existed, it was weak.
crude,ng1 = filter_data(wti,natgas,start='04-2010',end='05-20-2015') #filter the time period
fig,ax = plt.subplots(1,2)
rcParams['figure.figsize'] = 12,4
tsaplots.plot_pacf(crude.diff().dropna(),lags=24,ax=ax[0]); #plot the pacf for the two commodites
tsaplots.plot_pacf(ng1.diff().dropna(),lags=24,ax=ax[1]);
print(ADF(crude.values,trend='ct',lags=1)) #based on pacf
print '\n'
print(ADF(ng1.values,trend='ct',lags=1))
Augmented Dickey-Fuller Results ===================================== Test Statistic -1.214 P-value 0.908 Lags 1 ------------------------------------- Trend: Constant and Linear Time Trend Critical Values: -3.97 (1%), -3.41 (5%), -3.13 (10%) Null Hypothesis: The process contains a unit root. Alternative Hypothesis: The process is weakly stationary. Augmented Dickey-Fuller Results ===================================== Test Statistic -2.223 P-value 0.477 Lags 1 ------------------------------------- Trend: Constant and Linear Time Trend Critical Values: -3.97 (1%), -3.41 (5%), -3.13 (10%) Null Hypothesis: The process contains a unit root. Alternative Hypothesis: The process is weakly stationary.
#Hence we accept that the two time series are integrated. Next we specify the model with a constant:
m1 = st2.ar_ols(crude.diff().dropna(),1)
m2 = st2.ar_ols(ng1.diff().dropna(),1)
print 'wti_const = ', m1.params[0]
print 'wti_p value = ', m1.pvalues[0]
print '\n'
print 'natgas_const = ', m2.params[0]
print 'natgas_p value = ', m2.pvalues[0]
wti_const = -0.000320841124112 wti_p value = 0.530964229016 natgas_const = -0.000315141428769 natgas_p value = 0.67879035658
#again, no constant, and no trend in our data. Next we perform PhillipsPerron unit test:
print(PhillipsPerron(crude.values,trend='nc'))
print '\n'
print(PhillipsPerron(ng1.values,trend='nc'))
Phillips-Perron Test (Z-tau) ===================================== Test Statistic -0.613 P-value 0.449 Lags 23 ------------------------------------- Trend: No Trend Critical Values: -2.57 (1%), -1.94 (5%), -1.62 (10%) Null Hypothesis: The process contains a unit root. Alternative Hypothesis: The process is weakly stationary. Phillips-Perron Test (Z-tau) ===================================== Test Statistic -0.703 P-value 0.411 Lags 23 ------------------------------------- Trend: No Trend Critical Values: -2.57 (1%), -1.94 (5%), -1.62 (10%) Null Hypothesis: The process contains a unit root. Alternative Hypothesis: The process is weakly stationary.
st2.boxTest(m1.resid)
Xsquared 3.50661264242 pvalue 0.966873374825 Xsquared 17.0178368508 pvalue 0.651815132571
st2.boxTest(m2.resid)
Xsquared 12.8612503594 pvalue 0.231533281821 Xsquared 20.6759900301 pvalue 0.416419172152
#perform the phillips ouliaris test and johanset test
df_post = pd.DataFrame(np.column_stack([ng1.values,crude.values]),index=ng1.index,columns=prices)
r_df = com.convert_to_r_dataframe(df_post)
%Rpush r_df
%R y.post = ts(r_df)
%R print(summary(ca.po(y.post,lag='long',type='Pz',demean='constant'))) #phillips perron unit test
######################################## # Phillips and Ouliaris Unit Root Test # ######################################## Test of type Pz detrending of series with constant only Response log.natgas. : Call: lm(formula = log.natgas. ~ zr) Residuals: Min 1Q Median 3Q Max -0.11473 -0.01701 -0.00024 0.01623 0.16652 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.0006415 0.0127135 -0.050 0.960 zrlog.natgas. 0.9906251 0.0037985 260.796 <2e-16 *** zrlog.wti. 0.0039183 0.0040469 0.968 0.333 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.02738 on 1291 degrees of freedom Multiple R-squared: 0.9824, Adjusted R-squared: 0.9824 F-statistic: 3.601e+04 on 2 and 1291 DF, p-value: < 2.2e-16 Response log.wti. : Call: lm(formula = log.wti. ~ zr) Residuals: Min 1Q Median 3Q Max -0.108039 -0.009506 0.000511 0.009782 0.089230 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.0090995 0.0085643 1.062 0.288 zrlog.natgas. -0.0005765 0.0025588 -0.225 0.822 zrlog.wti. 0.9972892 0.0027261 365.830 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.01845 on 1291 degrees of freedom Multiple R-squared: 0.991, Adjusted R-squared: 0.9909 F-statistic: 7.071e+04 on 2 and 1291 DF, p-value: < 2.2e-16 Value of test-statistic is: 19.89 Critical values of Pz are: 10pct 5pct 1pct critical values 47.5877 55.2202 71.9273
PO test has failed to reject the null of no cointegration. i.e the data is not co-integrated during this time period. Next we perform the Johansen test:
#Johansen test:
%R print(summary(ca.jo(y.post,type='eigen',ecdet='const')))
%R print(summary(ca.jo(y.post,type='trace',ecdet='const')))
###################### # Johansen-Procedure # ###################### Test type: maximal eigenvalue statistic (lambda max) , without linear trend and constant in cointegration Eigenvalues (lambda): [1] 4.559679e-03 9.744094e-04 2.806730e-19 Values of teststatistic and critical values of test: test 10pct 5pct 1pct r <= 1 | 1.26 7.52 9.24 12.97 r = 0 | 5.91 13.75 15.67 20.20 Eigenvectors, normalised to first column: (These are the cointegration relations) log.natgas..l2 log.wti..l2 constant log.natgas..l2 1.0000000 1.000000 1.000000 log.wti..l2 -0.4812282 8.184129 1.935679 constant 0.2732822 -26.301218 -8.135502 Weights W: (This is the loading matrix) log.natgas..l2 log.wti..l2 constant log.natgas..d -0.0088752672 -3.343469e-05 1.205727e-17 log.wti..d -0.0002789156 -2.923741e-04 -8.243375e-18
###################### # Johansen-Procedure # ###################### Test type: trace statistic , without linear trend and constant in cointegration Eigenvalues (lambda): [1] 4.559679e-03 9.744094e-04 2.806730e-19 Values of teststatistic and critical values of test: test 10pct 5pct 1pct r <= 1 | 1.26 7.52 9.24 12.97 r = 0 | 7.17 17.85 19.96 24.60 Eigenvectors, normalised to first column: (These are the cointegration relations) log.natgas..l2 log.wti..l2 constant log.natgas..l2 1.0000000 1.000000 1.000000 log.wti..l2 -0.4812282 8.184129 1.935679 constant 0.2732822 -26.301218 -8.135502 Weights W: (This is the loading matrix) log.natgas..l2 log.wti..l2 constant log.natgas..d -0.0088752672 -3.343469e-05 1.205727e-17 log.wti..d -0.0002789156 -2.923741e-04 -8.243375e-18
After 03-2010, Johansen test also fails to reject no cointegration. All the tests conclude that there is no co-integrating relationship anymore. A doucoupling effect has taken over since 06-29-2009.
VAR models help to characterize the joint behavior of collection of variables. VAR models are generally used for structural inference and policy analysis. In structural analysis, certain assumptions about the causal structure of the data under investigation are imposed, and the resulting causal impacts of unexpected shocks or innovations to specified variables on the variables in the model are summarized. These causal impacts are usually summarized with impulse response functions and forecast error variance decompositions.
Since we have two time series, we will use a Var(p) model of the form:
$X_t = \beta_{x,0} + \beta_{x,x_1}X_{t-1} + \beta_{x,x_2}X_{t-2} + ... + \beta_{x,x_p}X_{t-p} + \beta_{x,y_1}Y_{t-1} + \beta_{x,y_2}Y_{t-2} + ... + \beta_{x,y_p}Y_{t-p} + v_t ^{x}$
$Y_t = \beta_{y,0} + \beta_{y,y_1}Y_{t-1} + \beta_{y,y_2}Y_{t-2} + ... + \beta_{y,y_p}Y_{t-p} + \beta_{y,x_1}X_{t-1} + \beta_{y,x_2}X_{t-2} + ... + \beta_{y,x_p}X_{t-p} + v_t ^{y}$
where $\beta_{x,y_p}$ is coefficient of Y at equation for X at lag p
and $\beta_{y,x_p}$ is coefficient of X at equation for Y at lag p
If lagged variables of one return $X_t$ helps predict current and future returns of another variable $Y_{t}$ better than $Y_{t}$ alone, this is called $X_{t}$ Granger Causes $Y_{t}$, This is the lead-lag relationship between the two returns.
In order to test for Granger Casuality from one variable to another, we test joint significance of all variables containing lagged Y in the first equation.
Y Granger Causes X if and only if $H_0$ : $ \beta_{x,y_1} = \beta_{x,y_2} = ... = \beta_{x,y_p} = 0$ is rejected
1)Test each of the time-series to determine their order of integration.
2)Let the maximum order of integration for the group of time-series be m.
3)Set up a VAR model in the levels (not the differences) of the data, regardless of the orders of integration of the various time-series.
4)Determine the appropriate maximum lag length for the variables in the VAR, say p, using the usual methods.
5)Make sure that the VAR is well-specified.
6)If two or more of the time-series have the same order of integration, at Step 1, then test to see if they are cointegrated.
7)No matter what you conclude about cointegration at Step 6, this is not going to affect what follows. It just provides a possible cross-check on the validity of your results at the very end of the analysis.
8)Take the preferred VAR model and add in m additional lags of each of the variables into each of the equations.
9)Test for Granger non-causality as follows. For expository purposes, suppose that the VAR has two equations, one for X and one for Y. Test the hypothesis that the coefficients of (only) the first p lagged values of X are zero in the Y equation, using a standard Wald test. Then do the same thing for the coefficients of the lagged values of Y in the X equation.
10)Make sure that you don't include the coefficients for the 'extra' m lags when you perform the Wald tests.
11)The Wald test statistics will be asymptotically chi-square distributed with p d.o.f., under the null.
12)Rejection of the null implies a rejection of Granger non-causality. Finally, look back at what you concluded in Step 6 about cointegration
We know that each of the time series is I(1) as we did that earlier utilizing ADF tests. Hence, maximum order of integration is 1. Also, we note that there is a structural break in 2009. Casuality tests are very sensitive to structural breaks in data. Hence, we will omit up to this time period and consider the period 02/2009- 2015 and a pre structural break period from 1998-01/2009:
#plot of the data
fig,axes=plt.subplots(1,2)
df_post['log(wti)'].plot(title='log(WTI)',ax=axes[0],color='black')
df_post['log(natgas)'].plot(title='log(NatGas)',ax=axes[1],color='orange')
<matplotlib.axes.AxesSubplot at 0x7fde14809e50>
The spread between the two series has definitely increased when compared to the past. I chose 04-2010 as the starting point to test Granger Causality because that is when the relationship between the two series seem to decouple.
%R print(VARselect(y.post,lag=20,type='const')$selection) #step 4) Determining maximum lag length
AIC(n) HQ(n) SC(n) FPE(n) 2 1 1 2
#Either lag 1 or 2
#step 3) specifying a VAR model on the levels
#Step 5) Making sure the VAR model is well specified:
%R v_1 <- VAR(y.post,p=1,type='const')
%R v_2 <- VAR(y.post,p=2,type='const') #lag 2
%R print(serial.test(v_1,lags.bg=16,type='BG'))
%R print(serial.test(v_2,lags.bg=16,type='BG'))
Breusch-Godfrey LM test data: Residuals of VAR object v_1 Chi-squared = 67.524, df = 64, p-value = 0.3577
Breusch-Godfrey LM test data: Residuals of VAR object v_2 Chi-squared = 59.97, df = 64, p-value = 0.6197
We can see that both the lag 1 and lag 2 show no serial correlation at the residuals. This is confirmed by the Breusch-Godfrey Lagrange Multiplier Test with a p value of 0.3577 and 0.6197. I will choose a lag 2 model based on AIC criteria. Hence we choose lag-2:
%R print(coef(v_2))
$log.natgas. Estimate Std. Error t value Pr(>|t|) log.natgas..l1 0.926942866 0.02800107 33.1038318 2.147082e-174 log.wti..l1 0.011203670 0.04159033 0.2693816 7.876792e-01 log.natgas..l2 0.064148432 0.02799846 2.2911415 2.211609e-02 log.wti..l2 -0.007206275 0.04166948 -0.1729389 8.627266e-01 const -0.001546080 0.01271948 -0.1215521 9.032727e-01 $log.wti. Estimate Std. Error t value Pr(>|t|) log.natgas..l1 -0.013841171 0.018871576 -0.7334401 4.634235e-01 log.wti..l1 0.936408609 0.028030176 33.4071615 9.390538e-177 log.natgas..l2 0.013269881 0.018869813 0.7032333 4.820375e-01 log.wti..l2 0.061332785 0.028083523 2.1839420 2.914639e-02 const 0.007613573 0.008572405 0.8881489 3.746265e-01
#stability analysis
print(robj.r('1/roots(v_2)[[2]]')) # > 1
print(robj.r('1/roots(v_2)[[1]]')) # > 1
#alternative stability analysis,
#Computes an empirical fluctuation process
%R stability_v2 <- stability(v_2)
%R plot(stability_v2)
[1] 1.008164 [1] 1.002454
%R plot(stability(v_2,type='ME'))
strucchange = importr('strucchange')
%R wt <- breakpoints(y.post[,2] ~ 1)
%R print(wt)
%R print(summary(wt))
Optimal 5-segment partition: Call: breakpoints.formula(formula = y.post[, 2] ~ 1) Breakpoints at observation number: 225 535 807 1101 Corresponding to breakdates: 225 535 807 1101
Optimal (m+1)-segment partition: Call: breakpoints.formula(formula = y.post[, 2] ~ 1) Breakpoints at observation number: m = 1 1101 m = 2 194 1101 m = 3 194 819 1101 m = 4 225 535 807 1101 m = 5 194 405 623 817 1101 Corresponding to breakdates: m = 1 1101 m = 2 194 1101 m = 3 194 819 1101 m = 4 225 535 807 1101 m = 5 194 405 623 817 1101 Fit: m 0 1 2 3 4 5 RSS 48.56 24.88 19.18 18.22 17.67 18.00 BIC -562.76 -1414.17 -1736.90 -1789.07 -1814.81 -1775.89
robj.r('''ocus2 <- efp(y.post[,2]~1,type='OLS-CUSUM')''')
robj.r('''ocus1 <- efp(y.post[,1]~1,type='OLS-CUSUM')''')
%R plot(ocus2)
%R plot(ocus1)
rcParams['figure.figsize'] = 9,4
%R fm0 <- lm(y.post[,2] ~ 1)
%R fm1 <- lm(y.post[,2]~breakfactor(wt,breaks=4))
fitted_fm0 = %R fitted(fm0)
fitted_fm1 = %R fitted(fm1)
df_post['log(wti)'].plot(color='black')
plt.plot(df_post.index,fitted_fm0,color='cadetblue')
plt.plot(df_post.index,fitted_fm1,color='orange')
fir = pd.to_datetime('2011-02-22')
sec = pd.to_datetime('2012-05-15')
third = pd.to_datetime('2013-06-13')
fourth = pd.to_datetime('2014-08-13')
plt.plot((fir,fir),(2.4,3.5),'b--')
plt.plot((sec,sec),(2.4,3.5),'b--')
plt.plot((third,third),(2.4,3.5),'b--')
plt.plot((fourth,fourth),(2.4,3.5),'b--')
plt.title('wti post')
<matplotlib.text.Text at 0x7fde0f938d50>
%R wt <- breakpoints(y.pre[,2] ~ 1)
%R print(wt)
%R print(summary(wt))
Optimal 5-segment partition: Call: breakpoints.formula(formula = y.pre[, 2] ~ 1) Breakpoints at observation number: 456 1220 1676 2361 Corresponding to breakdates: 456 1220 1676 2361
Optimal (m+1)-segment partition: Call: breakpoints.formula(formula = y.pre[, 2] ~ 1) Breakpoints at observation number: m = 1 1628 m = 2 456 1644 m = 3 456 1630 2360 m = 4 456 1220 1676 2361 m = 5 456 1078 1534 1990 2446 Corresponding to breakdates: m = 1 1628 m = 2 456 1644 m = 3 456 1630 2360 m = 4 456 1220 1676 2361 m = 5 456 1078 1534 1990 2446 Fit: m 0 1 2 3 4 5 RSS 1058.8 287.0 164.2 138.3 128.7 130.4 BIC 5441.4 1481.0 -202.6 -709.4 -913.7 -857.1
df_pre.index[[456,1220,1676,2361]]
DatetimeIndex(['1999-10-27', '2002-11-20', '2004-09-21', '2007-06-15'], dtype='datetime64[ns]', name=u'Date', freq=None, tz=None)
%R fm0 <- lm(y.pre[,2] ~ 1)
%R fm1 <- lm(y.pre[,2]~breakfactor(wt,breaks=4))
fitted_fm0 = %R fitted(fm0)
fitted_fm1 = %R fitted(fm1)
df_pre['log(wti)'].plot(color='black')
plt.plot(df_pre.index,fitted_fm0,color='green')
plt.plot(df_pre.index,fitted_fm1,color='red')
fir = pd.to_datetime('1999-10-27')
sec = pd.to_datetime('2002-11-20')
third = pd.to_datetime('2004-09-21')
fourth = pd.to_datetime('2007-06-15')
plt.plot((fir,fir),(1.0,3.5),'b--')
plt.plot((sec,sec),(1.0,3.5),'b--')
plt.plot((third,third),(1.0,3.5),'b--')
plt.plot((fourth,fourth),(1.0,3.5),'b--')
plt.plot(df_pre.index,df_pre['log(natgas)'].values,color='orange')
[<matplotlib.lines.Line2D at 0x7fde0f4f8e90>]
%R wt_entire <- breakpoints(y[,2]~1)
%R print(wt_entire)
Optimal 5-segment partition: Call: breakpoints.formula(formula = y[, 2] ~ 1) Breakpoints at observation number: 654 1634 2360 3214 Corresponding to breakdates: 654 1634 2360 3214
%R print(summary(wt_entire))
Optimal (m+1)-segment partition: Call: breakpoints.formula(formula = y[, 2] ~ 1) Breakpoints at observation number: m = 1 1640 m = 2 654 1780 m = 3 654 1634 2371 m = 4 654 1634 2360 3214 m = 5 654 1634 2360 3052 3706 Corresponding to breakdates: m = 1 1640 m = 2 654 1780 m = 3 654 1634 2371 m = 4 654 1634 2360 3214 m = 5 654 1634 2360 3052 3706 Fit: m 0 1 2 3 4 5 RSS 1720.0 387.1 302.7 241.3 230.6 232.6 BIC 8334.6 1849.3 792.6 -179.2 -360.1 -305.8
Gt_data.index[[654,1634,2360,3214]]
DatetimeIndex(['2000-08-15', '2004-07-22', '2007-06-14', '2010-11-02'], dtype='datetime64[ns]', name=u'Date', freq=None, tz=None)
%R fm0 <- lm(y[,2] ~ 1)
%R fm1 <- lm(y[,2]~breakfactor(wt_entire,breaks=4))
fitted_fm0 = %R fitted(fm0)
fitted_fm1 = %R fitted(fm1)
Gt_data['log(wti)'].plot(color='black')
plt.plot(Gt_data.index,fitted_fm0,color='green')
plt.plot(Gt_data.index,fitted_fm1,color='red')
fir = pd.to_datetime('2000-08-15')
sec = pd.to_datetime('2004-07-22')
third = pd.to_datetime('2007-06-14')
fourth = pd.to_datetime('2010-11-02')
plt.plot((fir,fir),(1.0,3.5),'b--')
plt.plot((sec,sec),(1.0,3.5),'b--')
plt.plot((third,third),(1.0,3.5),'b--')
plt.plot((fourth,fourth),(1.0,3.5),'b--')
plt.plot(Gt_data.index,Gt_data['log(natgas)'].values,color='orange')
[<matplotlib.lines.Line2D at 0x7fde0f3bf550>]
(Need to explain what OLS-CUSUM is and stability)
We can see that the model looks stable as verified by the plots and the stability calculation.
#Steps 6-7 have already been done before.
#Step 8: Adding in m=1 additional lag:
%R v_3 <- VAR(y.post,p=3,type='const') #lag 3
%R print(v_3$varresult)
$log.natgas. Call: lm(formula = y ~ -1 + ., data = datamat) Coefficients: log.natgas..l1 log.wti..l1 log.natgas..l2 log.wti..l2 log.natgas..l3 0.9308976 0.0152362 0.0966345 0.0125668 -0.0366066 log.wti..l3 const -0.0240485 -0.0005103 $log.wti. Call: lm(formula = y ~ -1 + ., data = datamat) Coefficients: log.natgas..l1 log.wti..l1 log.natgas..l2 log.wti..l2 log.natgas..l3 -0.012950 0.938223 0.031236 0.082836 -0.019002 log.wti..l3 const -0.023477 0.008314
#Steps 9-12:
# Wald-test for the first 2 lags
# The test can be directly done with the VAR model, however using the correct
# variables is a little more tricky
aod = importr('aod')
#Var-Model, lag=3 (additional lag,not tested)
#Wald-test (H0: Log(WTI) does not Granger-cause Log(NatGas))
%R print(wald.test(b=coef(v_3$varresult[[1]]), Sigma=vcov(v_3$varresult[[1]]), Terms=c(2,4))) #test is for three lags
# Failure to reject (X2=0.55; p=0.76)
Wald test: ---------- Chi-squared test: X2 = 0.55, df = 2, P(> X2) = 0.76
#Wald-Test (H0: Log(NatGas) does not Granger Cause Log(WTI))
%R print(wald.test(b=coef(v_3$varresult[[2]]), Sigma=vcov(v_3$varresult[[2]]), Terms=c(1,3))) #test is for two lags
#Failure to reject (X2=1.5,p=0.47)
Wald test: ---------- Chi-squared test: X2 = 1.5, df = 2, P(> X2) = 0.47
We can see that Log(WTI) does not Granger cause Log(NatGas), hence we fail to reject the Null Hypothesis that WTI does not Granger Cause NatGas. Wald Test gives a p-value of 0.4 and a Xsquared value of 3.0
log(NatGas) does not Granger cause Log(WTI) since the Wald Test gives a value of 3.4 with a p value of 0.34
Since there is no Granger Casuality, the two series are not co-integrated in this time span. This is confirmed in our earlier analysis for co-integration test that there wasn't a co-integrating behavior in this time span.
We already know that the two series are co-integrated in this time span. Hence, if two time series are co-integrated, there must be atleast one Granger Casual flow into our system. Hence, our Tado and Yamamato procedure provides a cross check to confirm a granger Casuality in time frame:
%R print(VARselect(y.pre,lag=20,type='both')$selection) #step 4) Determining maximum lag length
AIC(n) HQ(n) SC(n) FPE(n) 3 2 1 3
#Either lag 1,2 or 3
#step 3) specifying a VAR model on the levels
#Step 5) Making sure the VAR model is well specified:
%R v_1 <- VAR(y.pre,p=1,type='const') #lag 1
%R v_2 <- VAR(y.pre,p=2,type='const') #lag 2
%R v_3 <- VAR(y.pre,p=3,type='const') #lag 3
%R print(serial.test(v_1,lags.bg=16,type='BG'))
%R print(serial.test(v_2,lags.bg=16,type='BG'))
%R print(serial.test(v_3,lags.bg=16,type='BG'))
Breusch-Godfrey LM test data: Residuals of VAR object v_1 Chi-squared = 91.714, df = 64, p-value = 0.01314
Breusch-Godfrey LM test data: Residuals of VAR object v_2 Chi-squared = 76.202, df = 64, p-value = 0.1412
Breusch-Godfrey LM test data: Residuals of VAR object v_3 Chi-squared = 67.761, df = 64, p-value = 0.3502
The Breusch-Godfrey LM test says that there is correlation for the lag-1 model with a p-value of 0.01209. Hence we reject the null of no correlation for the lag 1 model. Lag 2 and Lag 3 are the only models that leads to acceptance of the null of no serial correlation. Based on AIC, I will choose a lag 3 model:
#stability analysis
%R print(1/roots(v_3)[[1]]) # > 1
#alternative stability analysis,
#Computes an empirical fluctuation process
%R plot(stability(v_3))
[1] 1.000965
#Steps 6-7 have already been done before.
#Step 8: Adding in m=1 additional lag:
%R v_4 <- VAR(y.pre,p=4,type='const') #lag 4 but test is upto lag 3
%R print(v_4$varresult)
$log.natgas. Call: lm(formula = y ~ -1 + ., data = datamat) Coefficients: log.natgas..l1 log.wti..l1 log.natgas..l2 log.wti..l2 log.natgas..l3 0.958913 -0.078484 0.044504 0.081827 -0.020790 log.wti..l3 log.natgas..l4 log.wti..l4 const 0.003501 0.010025 -0.002442 0.002465 $log.wti. Call: lm(formula = y ~ -1 + ., data = datamat) Coefficients: log.natgas..l1 log.wti..l1 log.natgas..l2 log.wti..l2 log.natgas..l3 0.002422 0.982291 0.020267 -0.036580 -0.025123 log.wti..l3 log.natgas..l4 log.wti..l4 const 0.087014 0.002572 -0.033862 0.002702
#Steps 9-12:
# Wald-test for the first 3 lags
# The test can be directly done with the VAR model, however using the correct
# variables is a little more tricky
#Var-Model, lag=4 (additional lag,not tested)
#Wald-test (H0: Log(WTI) does not Granger-cause Log(NatGas))
%R print(wald.test(b=coef(v_4$varresult[[1]]), Sigma=vcov(v_4$varresult[[1]]), Terms=c(2,4,6))) #test for four lags
# reject (X2=8.6; p=0.035)
Wald test: ---------- Chi-squared test: X2 = 8.6, df = 3, P(> X2) = 0.035
#Wald-Test (H0: Log(NatGas) does not Granger Cause Log(WTI))
%R print(wald.test(b=coef(v_4$varresult[[2]]), Sigma=vcov(v_4$varresult[[2]]), Terms=c(1,3,5)))
#Failure to reject (X2=3.0,p=0.39)
Wald test: ---------- Chi-squared test: X2 = 3.0, df = 3, P(> X2) = 0.39
We can confirm that there is Granger Casual flows into the system. Specifically, WTI Granger Causes NatGas. The Wald test statistic has a p value of 0.035 which rejects the null that all coefficients in front of the WTI regressors are 0. However, NatGas does not Granger cause WTI. We can confirm that the p value is indeed high at 0.39. Hence, we accept that log(NatGas) does not Granger Cause log(WTI). This provides a cross check because earlier we found that two series are co-integrated into this time frame, hence there must be atleast one granger Casual flows into the system.
Hence, log(WTI) granger causes NatGas
Finally, we will test for casuality for the entire data generating process:
%R print(VARselect(y,lag=20,type='both')$selection) #step 4) Determining maximum lag length
AIC(n) HQ(n) SC(n) FPE(n) 3 2 1 3
#Doesn't make any sense. Too many structural breaks in the data to take the entire data generating sequence into account.
#No way 10 weeks ago prices can determine todays price.
#Either lag 1 or 2
#step 3) specifying a VAR model on the levels
#Step 5) Making sure the VAR model is well specified:
%R v_1 <- VAR(y,p=1,type='const') #lag 1
%R v_2 <- VAR(y,p=2,type='const') #lag 2
%R v_3 <- VAR(y,p=3,type='const') #lag 3
%R print(serial.test(v_1,lags.bg=16,type='BG'))
%R print(serial.test(v_2,type='BG',lags.bg=16))
%R print(serial.test(v_3,type='BG',lags.bg=16))
Breusch-Godfrey LM test data: Residuals of VAR object v_1 Chi-squared = 103.36, df = 64, p-value = 0.001336
Breusch-Godfrey LM test data: Residuals of VAR object v_2 Chi-squared = 84.354, df = 64, p-value = 0.04504
Breusch-Godfrey LM test data: Residuals of VAR object v_3 Chi-squared = 76.444, df = 64, p-value = 0.137
Lag 1 model is inadequate as the Greusch Godfrey LM test rejects the null of no correlation. It seems a lag 2 model might work or even a lag 3 since the test accepts that there is no correlation at those lags. I will choose a lag 3 as it has the lowest AIC:
#stability analysis
%R print(1/roots(v_3)[[1]]) # > 1
%R print(1/roots(v_3)[[2]]) # > 1
#alternative stability analysis,
#Computes an empirical fluctuation process
%R plot(stability(v_3))
[1] 1.000983
[1] 1.003186
#Steps 6-7 have already been done before.
#Step 8: Adding in m=1 additional lag:
%R v_4 <- VAR(y,p=4,type='const') #lag 4 but test is upto lag 3
%R print(v_4$varresult)
$log.natgas. Call: lm(formula = y ~ -1 + ., data = datamat) Coefficients: log.natgas..l1 log.wti..l1 log.natgas..l2 log.wti..l2 log.natgas..l3 0.954070 -0.061186 0.055176 0.065747 -0.029874 log.wti..l3 log.natgas..l4 log.wti..l4 const -0.002476 0.017369 -0.001735 0.004061 $log.wti. Call: lm(formula = y ~ -1 + ., data = datamat) Coefficients: log.natgas..l1 log.wti..l1 log.natgas..l2 log.wti..l2 log.natgas..l3 0.001059 0.973969 0.019221 -0.014239 -0.022115 log.wti..l3 log.natgas..l4 log.wti..l4 const 0.063297 0.001802 -0.024067 0.002874
#Steps 9-12:
# Wald-test for the first 3 lags
# The test can be directly done with the VAR model, however using the correct
# variables is a little more tricky
#Var-Model, lag=4 (additional lag,not tested)
#Wald-test (H0: Log(WTI) does not Granger-cause Log(NatGas))
%R print(wald.test(b=coef(v_4$varresult[[1]]),Sigma=vcov(v_4$varresult[[1]]),Terms=c(2,4,6)))
Wald test: ---------- Chi-squared test: X2 = 7.4, df = 3, P(> X2) = 0.059
#Wald-Test (H0: Log(NatGas) does not Granger Cause Log(WTI))
%R print(wald.test(b=coef(v_4$varresult[[2]]), Sigma=vcov(v_4$varresult[[2]]), Terms=c(1,3,5)))
#Failure to reject (X2=3.5,p=0.32)
Wald test: ---------- Chi-squared test: X2 = 3.5, df = 3, P(> X2) = 0.32
We will conclude that there is no Granger Casuality during this time span. Wald test fails to reject that WTI does not Granger cause log(NatGas) with a p-value of 0.059. Log(NatGas) does not Granger Cause log(WTI) as wald test fails to reject no causality with a p-value of 0.32.
After testing for Granger Causality, we know that the two series are not co-integrated. Hence, we fit a VAR model to the differences for modeling IRF/forecating purposes:
# for post 2010 to current:
from statsmodels.tsa import vector_ar
model = vector_ar.var_model.VAR(df_post.diff().dropna()).fit(2)
impulse = model.irf(10)
impulse.plot(orth=True)
The top row shows the responses of change in log(natgas) to structural shocks while bottom shows responses to change in log(wti) to structural shocks. For the first shock $v_t^y$, a unit shock in natgas causes natgas returns to increase initially, but then it drops quickly down to 0 within a day. Similarly, it causes wti returns to initially increase by 0.22% before iet quickly goes down to 0 within 3 days. For the second shock $v_t^x$, it initialy causes no response to change in log(Natgas) returns. But then starts to increase over a period of 2 days. This is followed by a peak response on the 2nd day and then it quickly declines to 0 by the third day. Similarly, a bump in change in log(wti) causes log(wti) to increase initially but then it quickly drops down to 0 within a day.
model.fevd(20).plot()
depmixS4 = importr('depmixS4')
r_df = com.convert_to_r_dataframe(xt_data)
%Rpush r_df
%R set.seed(1)
robj.r('mod_1 <- depmix(log.wti.return~1,family=gaussian(),nstates=3,data=r_df)')
robj.r('fm2.wti <-fit(mod_1,verbose=FALSE)')
%R print(summary(fm2.wti))
converged at iteration 176 with logLik: 10574.48 Initial state probabilties model pr1 pr2 pr3 1 0 0 Transition matrix toS1 toS2 toS3 fromS1 9.874764e-01 0.01250217 2.147128e-05 fromS2 2.562176e-03 0.98813706 9.300760e-03 fromS3 5.027293e-18 0.06014494 9.398551e-01 Response parameters Resp 1 : gaussian Re1.(Intercept) Re1.sd St1 0.0001271791 0.01154948 St2 0.0009978216 0.02066919 St3 -0.0040858449 0.04742033 Re1.(Intercept) Re1.sd St1 0.0001271791 0.01154948 St2 0.0009978216 0.02066919 St3 -0.0040858449 0.04742033
robj.r('mod_2 <- depmix(log.natgas.return~1,family=gaussian(),nstates=2,data=r_df)')
robj.r('fm2.ng1 <-fit(mod_2,verbose=FALSE)')
%R print(summary(fm2.ng1))
converged at iteration 69 with logLik: 8806.075 Initial state probabilties model pr1 pr2 0 1 Transition matrix toS1 toS2 fromS1 0.97555362 0.02444638 fromS2 0.01455168 0.98544832 Response parameters Resp 1 : gaussian Re1.(Intercept) Re1.sd St1 0.0006970787 0.04792126 St2 -0.0003135152 0.02393802 Re1.(Intercept) Re1.sd St1 0.0006970787 0.04792126 St2 -0.0003135152 0.02393802
robj.r('probs_wti <- posterior(fm2.wti)') #compute the probability of being in each state')
robj.r('probs_ng1 <- posterior(fm2.ng1)')
%R print(head(probs_wti))
state S1 S2 S3 1 1 1.0000000 0.000000000 0.000000e+00 2 1 0.9925016 0.007492567 5.876321e-06 3 1 0.9912883 0.008689668 2.203591e-05 4 1 0.9805414 0.019370500 8.807924e-05 5 1 0.9862728 0.013663569 6.359678e-05 6 1 0.9921985 0.007769193 3.229607e-05
%R print(head(probs_ng1))
state S1 S2 1 2 0.000000000 1.0000000 2 2 0.008727422 0.9912726 3 2 0.029667893 0.9703321 4 2 0.014894217 0.9851058 5 2 0.009902829 0.9900972 6 2 0.007532017 0.9924680
regimes_wti = %R probs_wti[,1]
p_bear_wti = %R probs_wti[,4]
p_neutral_wti = %R probs_wti[,3]
p_bull_wti = %R probs_wti[,2]
regimes_ng1 = %R probs_ng1[,1]
p_bull_ng1 = %R probs_ng1[,2]
p_bear_ng1 = %R probs_ng1[,3]
fig,axes = plt.subplots(3,1)
rcParams['figure.figsize'] = 12,7
axes[0].plot(xt_data.index,p_bear_wti,color='firebrick')
axes[0].set_title('Bear Market probabilities')
axes[1].plot(xt_data.index,p_neutral_wti,color='steelblue')
axes[1].set_title('Neutral Market probabilities')
axes[2].plot(xt_data.index,p_bull_wti,color='seagreen')
axes[2].set_title('Bull Market probabilities')
fig.subplots_adjust(hspace=0.5)
fig,axes = plt.subplots(3,1)
rcParams['figure.figsize'] = 12,8
axes[0].plot(Gt_data.index,Gt_natgas.values,color='orange')
axes[1].plot(xt_data.index,p_bear_ng1,color='firebrick')
axes[1].set_title('Bear Market probabilities')
axes[2].plot(xt_data.index,p_bull_ng1,color='seagreen')
axes[2].set_title('Bull Market probabilities')
fig.subplots_adjust(hspace=0.5)
#treat both as a multavriate response:
%R set.seed(1)
robj.r('mod_mv <- depmix(list(log.wti.return~1,log.natgas.return~1),family=list(gaussian(),gaussian()),nstates=3,data=r_df)')
robj.r('fm2.mv <-fit(mod_mv,verbose=FALSE)')
%R print(summary(fm2.mv))
converged at iteration 77 with logLik: 19274.42 Initial state probabilties model pr1 pr2 pr3 0 1 0 Transition matrix toS1 toS2 toS3 fromS1 0.97177839 0.01780962 0.01041199 fromS2 0.01246740 0.97560944 0.01192317 fromS3 0.01540798 0.06030430 0.92428772 Response parameters Resp 1 : gaussian Resp 2 : gaussian Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd St1 0.0009443731 0.02085143 0.0018533547 0.04861918 St2 0.0009159999 0.01771647 0.0001798481 0.02331046 St3 -0.0039816654 0.04552447 -0.0047602486 0.03554069 Re1.(Intercept) Re1.sd Re2.(Intercept) Re2.sd St1 0.0009443731 0.02085143 0.0018533547 0.04861918 St2 0.0009159999 0.01771647 0.0001798481 0.02331046 St3 -0.0039816654 0.04552447 -0.0047602486 0.03554069
%R print(fm2.mv)
Convergence info: Log likelihood converged to within tol. (relative change) 'log Lik.' 19274.42 (df=20) AIC: -38508.84 BIC: -38381.24
robj.r('probs_mv <- posterior(fm2.mv)');
%R print(head(probs_mv))
state S1 S2 S3 1 2 0.000000000 1.0000000 0.000000000 2 2 0.006391617 0.9895390 0.004069403 3 2 0.024555111 0.9630582 0.012386687 4 2 0.012359459 0.9818309 0.005809609 5 2 0.007466752 0.9874616 0.005071648 6 2 0.005330069 0.9915689 0.003101038
regimes_mv = %R probs_mv[,1]
p_bear_mv = %R probs_mv[,4]
p_neutral_mv = %R probs_mv[,3]
p_bull_mv = %R probs_mv[,2]
fig,axes = plt.subplots(4,1)
rcParams['figure.figsize'] = 12,15
axes[0].plot(Gt_data.index,Gt_wti.values,color='black')
axes[0].plot(Gt_data.index,Gt_natgas.values,color='orange')
axes[1].plot(xt_data.index,p_bear_mv,color='firebrick')
axes[1].set_title('Bear Market probabilities')
axes[2].plot(xt_data.index,p_neutral_mv,color='steelblue')
axes[2].set_title('Neutral Market probabilities')
axes[3].plot(xt_data.index,p_bull_mv,color='seagreen')
axes[3].set_title('Bull Market probabilities')
fig.subplots_adjust(hspace=0.5)
#takes too long to execute. roughly 15-20 minutes
robj.r('wt <- breakpoints(log.natgas.return ~ log.wti.return,data=r_df,h=1)')
%R print(wt)
%R print(summary(wt))
xt_data.index[[758,1416,2076,2729]]
DatetimeIndex(['2001-01-17', '2003-09-05', '2006-04-28', '2008-12-01'], dtype='datetime64[ns]', name=u'Date', freq=None, tz=None)
rcParams['figure.figsize'] = 12,3
#%R fm0 <- lm(y[,2] ~ 1)
%R fm1 <- lm(r_df[,2]~breakfactor(wt,breaks=3))
#fitted_fm0 = %R fitted(fm0)
fitted_fm1 = %R fitted(fm1)
Gt_data['log(wti)'].plot(color='maroon')
#plt.plot(Gt_data.index,fitted_fm0)
#plt.plot(Gt_data.index,fitted_fm1)
fir = pd.to_datetime('2001-01-17')
sec = pd.to_datetime('2003-09-05')
third = pd.to_datetime('2006-04-28')
fourth = pd.to_datetime('2008-12-01')
plt.plot((fir,fir),(1.0,3.5),'b--')
plt.plot((sec,sec),(1.0,3.5),'b--')
plt.plot((third,third),(1.0,3.5),'b--')
plt.plot((fourth,fourth),(1.0,3.5),'b--')
plt.plot(Gt_data.index,Gt_data['log(natgas)'].values,color='orange')
[<matplotlib.lines.Line2D at 0x7f64a1128790>]