Records of 'ice-out' and 'ice-in' days on selected Minnesota lakes are available from the Minnesota DNR at their web site Ice Out Dates. The customs for determining ice-out and ice-in vary from lake to lake, so it's difficult to compare across different lakes. The DNR does attempt to use the same observers and criteria for each lake so longitudinal data series can provide useful insights.
# Display graphics inline with the notebook
%matplotlib inline
# Standard Python modules
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os
import datetime
import seaborn as sns
sns.set_context('talk')
# Statistics modules
import statsmodels.api as sm
# Directory where data files are stored
dir = "../data/"
img = "../images/"
# Read .csv file
RL_ICEOUT = pd.read_csv(dir + 'IceOut_RL.txt',index_col=0,parse_dates=True,skiprows=1);
# Remove duplicated data
RL_ICEOUT = RL_ICEOUT[RL_ICEOUT[' source']=='MNDNR-SCO']
# Drop unneeded columns
RL_ICEOUT.drop(' source',1,inplace=True)
RL_ICEOUT.drop(' comments',1,inplace=True)
# Create a new independent variable that counts years from the first date in the data set
RL_ICEOUT['DOY'] = RL_ICEOUT.index.dayofyear
RL_ICEOUT['n'] = RL_ICEOUT.index.year - RL_ICEOUT.index.year[0]
RL_ICEOUT.head()
DOY | n | |
---|---|---|
ice out date | ||
1930-05-03 | 123 | 0 |
1931-05-05 | 125 | 1 |
1932-05-07 | 128 | 2 |
1933-05-06 | 126 | 3 |
1934-05-06 | 126 | 4 |
# regression
y = RL_ICEOUT['DOY']
x = RL_ICEOUT['n']
def regress(y,x):
model = sm.OLS(y,sm.add_constant(x))
results = model.fit()
print(results.params)
print(results.summary())
plt.plot(y.index.year,y)
plt.plot(y.index.year,results.fittedvalues)
plt.yticks([92,102,112,122,132,142],
['Apr 1','Apr 10','Apr 20','May 1','May 10','May 20']);
ax = plt.gca()
# standard deviation of the residuals
Sy = np.sqrt(np.sum(results.resid**2)/(len(x)-2))
# plot the prediction interval (i.e., confidence interval for any particular date)
from scipy import stats
t = stats.t.ppf(1-0.025,df=len(x)-results.df_model-1)
conf = t*Sy*np.sqrt(1 + 1.0/len(x) + (x-x.mean())**2/np.sum((x-x.mean())**2))
ax.fill_between(y.index.year,results.fittedvalues-conf,results.fittedvalues+conf,alpha=0.2)
# plot confidence interval of the regression
conf = t*Sy*np.sqrt(1.0/len(x) + (x-x.mean())**2/np.sum((x-x.mean())**2))
ax.fill_between(y.index.year,results.fittedvalues-conf,results.fittedvalues+conf,alpha=0.4)
plt.xlabel('Year')
plt.ylabel('Day of Year')
regress(y,x)
# add labels
plt.title('Rainy Lake Ice Out')
fname = '../images/IceOut_RL.png'
plt.savefig(fname)
!convert $fname -trim $fname
# get current axis for use in plotting data from other lakes
rl = plt.gca()
const 126.906175 n -0.081787 dtype: float64 OLS Regression Results ============================================================================== Dep. Variable: DOY R-squared: 0.055 Model: OLS Adj. R-squared: 0.044 Method: Least Squares F-statistic: 4.913 Date: Fri, 23 Jun 2017 Prob (F-statistic): 0.0294 Time: 08:11:11 Log-Likelihood: -305.00 No. Observations: 86 AIC: 614.0 Df Residuals: 84 BIC: 618.9 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ const 126.9062 1.816 69.881 0.000 123.295 130.518 n -0.0818 0.037 -2.217 0.029 -0.155 -0.008 ============================================================================== Omnibus: 2.212 Durbin-Watson: 2.108 Prob(Omnibus): 0.331 Jarque-Bera (JB): 1.576 Skew: -0.290 Prob(JB): 0.455 Kurtosis: 3.321 Cond. No. 97.6 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
The relatively long data series for Rainy Lake provides a statistically signficant trend line. The ice-out day is consistently getting earlier and, at the some time, becoming more variable. Thus in the last few years, Rainy Lake has set earliest ice-out records and nearly broken lastest ice-out records.
# Read .csv file
KL_ICEOUT = pd.read_csv(dir + 'IceOut_KL.txt',index_col=0,parse_dates=True,skiprows=1)
# Remove duplicated data
KL_ICEOUT = KL_ICEOUT[KL_ICEOUT[' source']=='MNDNR-SCO']
# Drop unneeded columns
KL_ICEOUT.drop(' source',1,inplace=True)
KL_ICEOUT.drop(' comments',1,inplace=True)
# Determine Trend Line
KL_ICEOUT['DOY'] = KL_ICEOUT.index.dayofyear
KL_ICEOUT['n'] = pd.Series(np.arange(0, len(KL_ICEOUT)), index=KL_ICEOUT.index)
y = KL_ICEOUT['DOY']
x = KL_ICEOUT['n']
regress(y,x)
plt.title('Kabetogoma Lake Ice Out')
plt.ylabel('Day of Year')
plt.xlabel('Year')
plt.axis(rl.axis())
fname = img + 'IceOut_KL.png'
plt.savefig(fname)
!convert $fname -trim $fname
const 120.137931 n -0.203339 dtype: float64 OLS Regression Results ============================================================================== Dep. Variable: DOY R-squared: 0.024 Model: OLS Adj. R-squared: -0.014 Method: Least Squares F-statistic: 0.6273 Date: Fri, 23 Jun 2017 Prob (F-statistic): 0.436 Time: 08:11:14 Log-Likelihood: -105.77 No. Observations: 28 AIC: 215.5 Df Residuals: 26 BIC: 218.2 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ const 120.1379 4.039 29.744 0.000 111.835 128.440 n -0.2033 0.257 -0.792 0.436 -0.731 0.324 ============================================================================== Omnibus: 0.502 Durbin-Watson: 2.221 Prob(Omnibus): 0.778 Jarque-Bera (JB): 0.044 Skew: -0.076 Prob(JB): 0.978 Kurtosis: 3.120 Cond. No. 30.7 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
# Read .csv file
LOW_ICEOUT = pd.read_csv(dir + 'IceOut_LOW.txt',index_col=0,parse_dates=True,skiprows=1);
# Remove duplicated data
LOW_ICEOUT = LOW_ICEOUT[LOW_ICEOUT[' source']=='MNDNR-SCO']
# Drop unneeded columns
LOW_ICEOUT.drop(' source',1,inplace=True)
LOW_ICEOUT.drop(' comments',1,inplace=True)
# Determine Trend Line
LOW_ICEOUT['DOY'] = LOW_ICEOUT.index.dayofyear
LOW_ICEOUT['n'] = pd.Series(np.arange(0, len(LOW_ICEOUT)), index=LOW_ICEOUT.index)
y = LOW_ICEOUT['DOY']
x = LOW_ICEOUT['n']
regress(y,x)
plt.title('Lake of the Woods Ice Out')
plt.ylabel('Day of Year')
plt.xlabel('Year')
plt.axis(rl.axis())
fname = img + 'IceOut_LOW.png'
plt.savefig(fname)
!convert $fname -trim $fname
const 118.825123 n 0.118774 dtype: float64 OLS Regression Results ============================================================================== Dep. Variable: DOY R-squared: 0.009 Model: OLS Adj. R-squared: -0.029 Method: Least Squares F-statistic: 0.2272 Date: Fri, 23 Jun 2017 Prob (F-statistic): 0.638 Time: 08:11:18 Log-Likelihood: -104.93 No. Observations: 28 AIC: 213.9 Df Residuals: 26 BIC: 216.5 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [95.0% Conf. Int.] ------------------------------------------------------------------------------ const 118.8251 3.920 30.313 0.000 110.768 126.883 n 0.1188 0.249 0.477 0.638 -0.393 0.631 ============================================================================== Omnibus: 1.681 Durbin-Watson: 2.261 Prob(Omnibus): 0.431 Jarque-Bera (JB): 1.136 Skew: -0.492 Prob(JB): 0.567 Kurtosis: 2.936 Cond. No. 30.7 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.