Estimating Rainy Lake Inflows 1971-2014

  • <a href=#section01>Direct Estimate using the Balance Equation</a>
  • <a href=#section02>Estimating Rainy Lake Inflows with a Kalman Filter</a>
  • <a href=#section03>Comparing Inflows in 1971-1999 to 2000-2010</a>
  • <a href=#section04>Correlation of Precipitation with Lake Inflow</a>

Initialization

Initilization of the graphics system and computational modules used in this IPython notebook.

In [1]:
# Display graphics inline with the notebook
%matplotlib inline

# Standard Python modules
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import os
import datetime

# Modules to display images and data tables
from IPython.display import Image
from IPython.core.display import display

# Data Directory
dir = '../data/'
img = '../images/'

<a name=section01></a>Direct Estimate using the Balance Equation

A crude estimate of inflows can be found starting with the balance equation

$$A_{RL}\frac{dH_{RL}}{dt} = I_{RL}(t) - O_{RL}(t)$$

where the subscript RL refers to Rainy Lake, and $A$, $H$, $I$, and $O$ refer to area, lake level, inflow, and outflow, respectively. Discretizing in time

$$\frac{A_{RL}}{\Delta t}\left(H_{RL}(k+1)-H_{RL}(k)\right) = I_{RL}(k) - O_{RL}(k)$$

Since the sole output flow from Rainy Lake is through Rainy River, we can solve for inflow

$$I_{RL}(k) = \frac{A_{RL}}{\Delta t}\left(H_{RL}(k+1)-H_{RL}(k)\right) + O_{RL}(k)$$

The estimation method is sensitive to small errors in level measurement and therefore provides an estimate of the inflow plagued by excessive noise.

In [2]:
# Load Rainy River Flow and Rainy Lake Level data
RR = pd.read_pickle(dir + 'RR.pkl')
RL = pd.read_pickle(dir + 'RL.pkl')

# Load regression formulae for lake areas
import pickle
with open(dir + 'area.pkl', 'rb') as handle:
  area = pickle.load(handle)
In [3]:
area
Out[3]:
{'Namakan Reservoir': poly1d([   20.28853147, -6652.59327508]),
 'Rainy Lake': poly1d([    90.73298003, -29751.96877469])}
In [4]:
# Compute time series for Rainy Lake area in sq. meters
areaRL = pd.Series(area['Rainy Lake'](RL), index = RL.index) * 1.0e6

# Estimate Rainy Lake inflow in cubic meters per second
RLinflow = areaRL*(RL.shift(-1) - RL)/(24*3600) + RR

plt.figure(figsize=(10,3.5))
RLinflow.plot(linewidth=0.5)
plt.title('Rainy Lake Inflows {0}-{1}'.format(
    RLinflow.index.min().strftime('%Y'),
    RLinflow.index.max().strftime('%Y')))
plt.ylabel('cubic meters/sec')
plt.grid()

Constructing an Inflow Data Set for Rainy Lake

Examining the estimated inflows to Rainy Lake, there are evident inconsistencies in the data prior to 1920, and that may extend through to the about 1970. For that reason, only data from 1970 on is retained in a data set to be used for simulating lake levels under alternative control regimes.

In [5]:
# Plot raw data from 1970 forward
plt.figure(figsize=(10,6))
plt.subplot(3,1,1)
RLinflow['1970':].plot()
plt.title('Rainy Lake Inflow 1970--')
plt.grid()

plt.subplot(3,1,2)
RR['1970':].plot()
plt.title('Rainy Lake Outflow 1970--')
plt.grid()

plt.subplot(3,1,3)
RL['1970':].plot()
plt.title('Rainy Lake Level 1970--')
plt.grid()

plt.tight_layout()

# Plot flow histograms and save image
fig = plt.figure(figsize=(10,5))
ax = fig.add_subplot(1,1,1)

plt.hold(True)
RLinflow['1970':].hist(bins=range(0, 3000 + 25, 25), \
                       color = 'blue', alpha = 0.5, log=True, normed=1)
RR['1970':].hist(bins=range(0, 3000 + 25, 25), \
                 color = 'yellow', alpha = 0.5, log=True, normed=1)
plt.plot([950.0,950.0],[0,0.001],'r',lw=2,alpha=0.6)
plt.hold(False)
plt.xlim(-50,2000)
plt.ylim(0,0.01)

plt.xlabel('Flow [cubic meters/second]')
plt.ylabel('Frequency')
plt.legend(['Outflow at AGO Level', 'Inflow','Outflow'])
plt.title('Frequency vs Flowrates for Rainy Lake, 1970-2014')

plt.savefig(img + 'RLFlowFreq.png')

# Plot level histograms and save image

fig = plt.figure(figsize=(10,4))
ax = fig.add_subplot(1,1,1)

plt.hold(True)
RL['1970':].hist(bins=np.arange(336.3,339.0,0.05), \
                       color = 'blue', alpha = 0.5, log=True, normed=1)
plt.plot([337.9,337.9],[0.001,1.0],'r',lw=2)
plt.hold(False)
plt.xlim(336.3,339.0)
plt.ylim(0.001,3)

plt.legend(['AGO','Level'])
plt.xlabel('Level [meters]')
plt.ylabel('Frequency')
plt.title('Frequency vs Stage for Rainy Lake, 1970-2014')

plt.savefig(img + 'RLLevelFreq.png')
In [6]:
yr = '1970'

RLLevelFlow = pd.concat([RL[yr:],RLinflow[yr:],RR[yr:],],axis=1)
RLLevelFlow.columns = ['H','I','O']
RLLevelFlow = RLLevelFlow.interpolate()
RLLevelFlow.to_csv(dir + 'RLLevelFlow.csv')

RLLevelFlow['1971']
Out[6]:
H I O
1971-01-01 337.365 389.608017 479.0
1971-01-02 337.356 419.462053 479.0
1971-01-03 337.350 376.833097 476.0
1971-01-04 337.340 383.844301 473.0
1971-01-05 337.331 406.309788 436.0
1971-01-06 337.328 398.319239 428.0
1971-01-07 337.325 368.657381 428.0
1971-01-08 337.319 368.695186 428.0
1971-01-09 337.313 322.099488 411.0
1971-01-10 337.304 381.394850 411.0
1971-01-11 337.301 368.808603 428.0
1971-01-12 337.295 398.423204 428.0
1971-01-13 337.292 334.865311 394.0
1971-01-14 337.286 349.451558 379.0
1971-01-15 337.283 379.000000 379.0
1971-01-16 337.283 313.075689 382.0
1971-01-17 337.276 290.449188 379.0
1971-01-18 337.267 377.000000 377.0
1971-01-19 337.267 349.511417 379.0
1971-01-20 337.264 261.083472 379.0
1971-01-21 337.252 298.117347 357.0
1971-01-22 337.246 295.155152 354.0
1971-01-23 337.240 315.192958 374.0
1971-01-24 337.234 288.846145 377.0
1971-01-25 337.225 254.789521 382.0
1971-01-26 337.212 320.369383 379.0
1971-01-27 337.206 289.110782 377.0
1971-01-28 337.197 340.000000 340.0
1971-01-29 337.197 243.195845 331.0
1971-01-30 337.188 298.760302 328.0
... ... ... ...
1971-12-02 337.792 675.191433 779.0
1971-12-03 337.782 747.888935 779.0
1971-12-04 337.779 651.593543 776.0
1971-12-05 337.767 679.808574 773.0
1971-12-06 337.758 586.787272 773.0
1971-12-07 337.740 604.780018 770.0
1971-12-08 337.724 674.214982 767.0
1971-12-09 337.715 643.400059 767.0
1971-12-10 337.703 610.689101 765.0
1971-12-11 337.688 604.925384 759.0
1971-12-12 337.673 622.673446 756.0
1971-12-13 337.660 688.546579 750.0
1971-12-14 337.654 619.168769 742.0
1971-12-15 337.642 619.319990 742.0
1971-12-16 337.630 603.293944 736.0
1971-12-17 337.617 587.267898 719.0
1971-12-18 337.604 571.241851 725.0
1971-12-19 337.591 555.215805 728.0
1971-12-20 337.578 539.189759 722.0
1971-12-21 337.560 627.765004 719.0
1971-12-22 337.551 526.572361 719.0
1971-12-23 337.532 509.735830 722.0
1971-12-24 337.511 590.713533 742.0
1971-12-25 337.496 533.529744 745.0
1971-12-26 337.475 562.088778 753.0
1971-12-27 337.456 602.579907 753.0
1971-12-28 337.441 580.816191 731.0
1971-12-29 337.426 575.052475 725.0
1971-12-30 337.411 589.250257 719.0
1971-12-31 337.398 565.592246 745.0

365 rows × 3 columns

Plotting the estimated inflow for a typical year demonstrates the noisy characteristics of this estimate.

In [7]:
yr = '2014'

plt.figure(figsize=(9,7))
plt.subplot(2,1,1)
plt.hold(True)
RLinflow[yr].plot(color='b',linewidth=1)
RR[yr].plot(color='r')
plt.hold(False)

ax = plt.axis()
plt.ylim([0,ax[3]])
plt.ylabel('[cubic meters/sec]')
plt.title('Estimated Inflow to Rainy Lake ' + yr)
plt.legend(['Inflow','Outflow'],loc='upper left')

plt.subplot(2,1,2)
RL[yr].plot()
plt.ylabel('Level [meters]')
plt.title('Rainy Lake Level')
plt.tight_layout()

<a name=section02></a>Estimating Rainy Lake Inflows with a Kalman Filter

For this first model we assume the inflows and outflows to Rainy Lake are exogenous disturbances driven by zero mean white noise. Later we can consider models that incorporate features such as mean seasonal inflows, the dependence of outflow on lake level, and the an additional state variable representing the International Falls Dam. The purpose of this first model is to simply test the idea that a filtering and estimation techniques can provide better knowledge of the ungauged inflows to Rainy Lake.

The model is written

$$\underbrace{\left[\begin{array}{c} H_{RL}(k+1) \\ I_{RL}(k+1) \\ O_{RL}(k+1) \end{array}\right]}_{x(k+1)} = \underbrace{\left[ \begin{array}{ccc} 1 & \frac{\Delta t}{A_{RL}} & -\frac{\Delta t}{A_{RL}} \\0 & 1 & 0\\0 & 0 & 1\end{array}\right]}_{A} \underbrace{\left[\begin{array}{c} H_{RL}(k) \\ I_{RL}(k) \\ O_{RL}(k) \end{array}\right]}_{x(k)} + \underbrace{\left[ \begin{array}{c} 0 \\ w_{I}(k) \\ w_{O}(k)\end{array}\right]}_{w(k)}$$

where $H$, $I$, $O$ are lake level, inflow, and outflow, respectively. The signals $w_I(k)$, and $w_O(k)$ are zero-mean independent and identically distributed random increments to changing lake inflow and outflows. The vector $w(k)$ is assumed to be distributed as multivariate Normal distribution with zero mean and a covariance $Q$.

The measurement model is written

$$\underbrace{\left[\begin{array}{c} y_H(k) \\ y_O(k) \end{array}\right]}_{y(k)} = \underbrace{\left[\begin{array}{ccc} 1 & 0 & 0 \\ 0 & 0 & 1 \end{array}\right]}_{C} \underbrace{\left[\begin{array}{c} H_{RL}(k) \\ I_{RL}(k) \\ O_{RL}(k) \end{array}\right]}_{x(k)} + \underbrace{\left[\begin{array}{c} v_{H}(k) \\ v_{O}(k) \end{array}\right]}_{v(k)} $$

where the measurement noise vector $v(k)$ is an i.i.d. random variate with zero mean and a covariance $R$.

A Kalman filter provides a two step method for estimate values of the state vector $x(k)$. The the estimate of $x(k)$ using all information available up to time $k-1$ is the prediction step given by

$$\hat{x}(k|k-1) = A \hat{x}(k-1|k-1)$$

with covariance

$$\hat{P}(k|k-1) = A \hat{P}(k-1|k-1) A^T + Q$$

In the measurement update step, an innovation is the difference between the actual measurement at time $k$ versus the predicted measurement

$$e(k) = y(k) - C \hat{x}(k|k-1)$$

with covariance

$$S(k) = C\hat{P}(k|k-1)C^T + R$$

The Kalman filter gain is given by

$$K(k) = \hat{P}(k|k-1)C^TS^{-1}(k)$$

which is used to compute the updated state estimate

$$\hat{x}(k|k) = \hat{x}(x|k-1) + K(k)e(k)$$

and

$$\hat{P}(k|k) = \hat{P}(k|k-1) - K(k)S(k)K^T(k)$$

This is implemented in the following cells for the purpose of estimating inflow to Rainy Lake from historical data of lake levels and outflows.

In [8]:
areaRL = float(97500*10000)
dt = float(24*3600)

A = np.array([[1.0, dt/areaRL, -dt/areaRL],
     [0.0, 1.0, 0.0],
     [0.0, 0.0, 1.0]])

C = np.array([[1.0, 0.0, 0.0],
              [0.0, 0.0, 1.0]])

# Covariance of state inputs
Q = np.array([[0.0, 0.0, 0.0],
              [0.0, 10000.0, 0.0],
              [0.0, 0.0, 10000.0]])

# Measurement covariance
R = np.array([[0.0001, 0],[0.0, 1000.0]])
In [9]:
# Load Rainy Lake Level and Rainy River Flow data

yrA = '1971'
yrB = '2010'

y = pd.concat([pd.read_pickle(dir+'RL.pkl')[yrA:yrB],
               pd.read_pickle(dir+'RR.pkl')[yrA:yrB]],axis=1)
y.columns = ['H','O']

y = y.interpolate()
In [10]:
# DataFrame to store results
x = pd.DataFrame(index = y.index,columns=('H','I','O'))
e = pd.DataFrame(index = y.index,columns=('H','O'))

# Initial values for x and P

x.ix[0] = np.array([y.ix[0]['H'], y.ix[0]['O'], y.ix[0]['O']])
e.ix[0] = np.array([0,0])
P = np.array([[0.001, 0.0, 0.0], [0.0, 100.0, 0.0], [0.0, 0.0, 100.0]])

# Kalman Filter
for k in range(1,len(y.index)):
    # Prediction
    xpred = np.dot(A,x.ix[k-1])
    Ppred = np.dot(np.dot(A,P),np.transpose(A)) + Q
    # Measurement
    ymeas = np.array([y.ix[k]['H'],y.ix[k]['O']])
    # Update
    e.ix[k] = ymeas - np.dot(C,xpred)
    S = np.dot(np.dot(C,Ppred),np.transpose(C)) + R
    K = np.dot(np.dot(Ppred,np.transpose(C)),np.linalg.inv(S))
    x.ix[k] = xpred + np.dot(K,e.ix[k])
    P = Ppred - np.dot(np.dot(K,S),np.transpose(K))
    
# Pickle Rainy Lake Inflows for use in other notebooks
x['I'].to_pickle('../data/RLInflows.pkl')
x['H'].to_pickle('../data/RLLevels.pkl')
x['O'].to_pickle('../data/RLOutflows.pkl')

x.to_csv('../data/RLEstimates.csv')
In [11]:
plt.figure(figsize=(9,7))

plt.subplot(2,1,1)
plt.hold(True)
x['I'].plot(linewidth=0.5)
x['O'].plot(linewidth=0.5)
plt.hold(False)
plt.title('Estimated Inflow and Outflows ' + yrA + '-' + yrB)
plt.legend(['Estimated Inflow (I)','Estimated Outflow (O)'])

plt.subplot(2,1,2)
plt.hold(True)
x['H'].plot(linewidth=0.5)
RL[yrA:yrB].plot(linewidth=0.5)
plt.hold(False)
plt.title('Estimated Lake Level ' + yrA + '-' + yrB)
plt.legend(['Estimated Level (H)','Measured Level'])

plt.tight_layout()
    
plt.figure(figsize=(9,7))

plt.subplot(2,1,1)
e['H'].plot(linewidth=0.5)
plt.title('Level Estimation Error ' + yrA + '-' + yrB)

plt.subplot(2,1,2)
e['O'].plot(linewidth=0.5)
plt.title('Outflow Estimation Error ' + yrA + '-' + yrB)

plt.tight_layout()

<a name=section03></a>Comparing Inflows in 1971-1999 to 2000-2010

In [12]:
# Shift to a common year
I = pd.DataFrame([])
for yr,r in x['I'].groupby(x.index.year):
    shift = datetime.datetime(2014,1,1) - datetime.datetime(yr,1,1)
    r = r.tshift(shift.days)
    r.name = yr
    I = pd.concat([I,r],axis=1)
    
# Plot
plt.figure(figsize=(9,6))
plt.subplot(2,1,1)
idx = range(1971,2000)
I[idx].mean(axis=1).plot(color='r',linewidth=4)
plt.hold(True)
I[idx].plot(ax=plt.gca(),legend=False)
I[idx].mean(axis=1).plot(color='r',linewidth=4)
plt.ylim(0,1800)
plt.title('Rainy Lake Inflow 1971-1999')
plt.hold(False)

import matplotlib.dates as mdates

plt.gca().xaxis.set_major_locator(mdates.MonthLocator())
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%b'))

# Plot
plt.subplot(2,1,2)
idx = range(2000,2011)
I[idx].mean(axis=1).plot(color='r',linewidth=4)
plt.hold(True)
I[idx].plot(ax=plt.gca(),legend=False)
I[idx].mean(axis=1).plot(color='r',linewidth=4)
plt.ylim(0,1800)
plt.title('Rainy Lake Inflow 2000-2010')
plt.hold(False)

import matplotlib.dates as mdates

plt.gca().xaxis.set_major_locator(mdates.MonthLocator())
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%b'))

plt.tight_layout()