Water Survey of Canada HYDAT¶

Historical data for lake levels and stream flows in the Rainy River watershed are available from several public sources including the HYDAT database from the Water Survey of Canada (WSC). The HYDAT database database consists of Microsoft Access .mdb file providing data on daily water flows and levels throughout Canada.

For the purposes of these notebooks, individual tables (STATIONS, DLY_FLOWS, and DLY_LEVELS) were extracted from the database as .csv files with MDB Explorer. This notebook then reads the ..csv file to extract information relevant to the Rainy River basin and stores the results in this repository's data directory. Due to size constraints , only the data relevant to the Rainy River Watershed is included in the repository. Neither the HYDAT database or extracted .csv files are included in this repository.

This notebooks is run whenever a new version of the HYDAT database becomes available, normally on a quarterly basis, or if additional flow and level stations are needed in the repository data cache.

Table of Contents¶

Initialization: Load needed libaries and data tables from the HYDAT database.
[Monitoring Stations in the Rainy River Watershed](#Monitoring Stations): Create pandas dataframes for all stations with a geographic box bounded by latitude and longitude. Produces a pickled dataframe WSC_STATIONS in the data directory.
Pickle Level Data to WSC_LEVELS: Extract level data from the HYDAT database and pickle to WSC_LEVELS data set. The pickled dataframe is indexed by date, with columns tagged by station name.
Pickle Flow Data to WSC_FLOWS: Extract level data from the HYDAT database and pickle to WSC_LEVELS data set. The pickled dataframe is indexed by date, with columns tagged by station name.

Initialization and Loading of HYDAT Data Tables¶

In [1]:

# Standard Python modules
import pandas as pd
from IPython.core.display import display

In [2]:

# Directory where data files have been stored
HYDAT_dir = "../data/HYDAT/"

# Read data files as a pandas dataframes
STATIONS = pd.read_csv(HYDAT_dir + 'STATIONS.csv', index_col = 0);
DLY_LEVELS = pd.read_csv(HYDAT_dir + 'DLY_LEVELS.csv')
DLY_FLOWS = pd.read_csv(HYDAT_dir + 'DLY_FLOWS.csv')

print("    Stations in the HYDAT database = ", len(STATIONS.index))

    Stations in the HYDAT database =  7775

Monitoring Stations in the Rainy River Watershed¶

The following cell creates a pandas dataframe of monitoring stations from the STATIONS.csv table extracted from the HYDAT database. The extaction searches for all stations with a specified region bounded by latitude and longitudes.

For reference, this is a map of the Rainy River drainage basin available from the International Joint Commission.

In [3]:

# bounding region
lngW = -94.0
lngE = -90.5
latS = 47.5
latN = 49.0

# locate monitoring stations within bounding region
stationList = STATIONS[(STATIONS['LATITUDE'] >= latS) & \
         (STATIONS['LATITUDE'] <= latN) & \
         (STATIONS['LONGITUDE'] <= lngE) & \
         (STATIONS['LONGITUDE'] >= lngW)].index

# reduce data sets to the area of interest
WSC_STATIONS = STATIONS.ix[stationList]
DLY_LEVELS = DLY_LEVELS.ix[[idx for idx in DLY_LEVELS.index if DLY_LEVELS.ix[idx,'STATION_NUMBER'] in stationList]]
DLY_FLOWS = DLY_FLOWS.ix[[idx for idx in DLY_FLOWS.index if DLY_FLOWS.ix[idx,'STATION_NUMBER'] in stationList]]

# add columns to STATIONS for level and flow stations
WSC_STATIONS['Level'] = ''
WSC_STATIONS['Flow'] = ''

# mark level and flow stations
WSC_STATIONS.ix[set(DLY_LEVELS['STATION_NUMBER']).intersection(WSC_STATIONS.index),'Level'] = True
WSC_STATIONS.ix[set(DLY_FLOWS['STATION_NUMBER']).intersection(WSC_STATIONS.index),'Flow'] = True

print("Stations within region of interest = ", len(WSC_STATIONS.index))
print("DLY_FLOWS records = ", len(DLY_FLOWS.index))
print("DLY_LEVELS records = ", len(DLY_LEVELS.index))
display(WSC_STATIONS.ix[:,['Level','Flow','STATION_NAME','LATITUDE','LONGITUDE',]])

WSC_STATIONS.to_pickle('../data/WSC_STATIONS')

Stations within region of interest =  34
DLY_FLOWS records =  7965
DLY_LEVELS records =  5913

	Level	Flow	STATION_NAME	LATITUDE	LONGITUDE
STATION_NUMBER
05PA001		True	KETTLE RIVER ABOVE KETTLE FALLS	48.49444	-92.64444
05PA003	True		NAMAKAN LAKE ABOVE KETTLE FALLS DAM	48.50000	-92.63886
05PA005	True		NORTHERN LIGHT LAKE AT OUTLET	48.24028	-90.76889
05PA006	True	True	NAMAKAN RIVER AT OUTLET OF LAC LA CROIX	48.38256	-92.17631
05PA007	True		CROOKED LAKE NEAR CURTAIN FALLS	48.23750	-91.90611
05PA010	True		FRENCH LAKE NEAR ATIKOKAN	48.67222	-91.13500
05PA011	True		LAC LA CROIX AT CAMPBELL'S CAMP	48.35508	-92.21744
05PA012	True	True	BASSWOOD RIVER NEAR WINTON	48.08256	-91.65117
05PA013	True		NAMAKAN LAKE AT SQUIRREL ISLAND	48.49686	-92.65856
05PB001		True	SEINE RIVER NEAR LA SEINE	48.75000	-92.20000
05PB002	True		LITTLE TURTLE LAKE NEAR MINE CENTRE	48.77222	-92.60833
05PB003		True	MANITOU RIVER ABOVE DEVIL'S CASCADE	48.97500	-93.34167
05PB004		True	FOOTPRINT RIVER AT RAINY LAKE FALLS	48.85833	-93.57500
05PB007	True		RAINY LAKE NEAR FORT FRANCES	48.64912	-93.32068
05PB009		True	SEINE RIVER AT STURGEON FALLS GENERATING STATION	48.74444	-92.28472
05PB012	True		LAC DES MILLE LACS ABOVE OUTLET DAM	48.98000	-90.73000
05PB014	True	True	TURTLE RIVER NEAR MINE CENTRE	48.85022	-92.72383
05PB015		True	PIPESTONE RIVER ABOVE RAINY LAKE	48.56861	-92.52417
05PB018	True	True	ATIKOKAN RIVER AT ATIKOKAN	48.75197	-91.58408
05PB019		True	NORTHEAST TRIBUTARY TO DASHWA LAKE NEAR ATIKOKAN	48.95083	-91.71222
05PB020		True	EASTERN TRIBUTARY TO DASHWA LAKE NEAR ATIKOKAN	48.94056	-91.69833
05PB021		True	EYE RIVER NEAR HARDTACK LAKE NORTH OF ATIKOKAN	48.92500	-91.66222
05PB022		True	EYE RIVER NEAR COULSON LAKE NORTH OF ATIKOKAN	48.89444	-91.66750
05PB023	True		RAINY LAKE AT NORTHWEST BAY	48.84167	-93.62000
05PB024	True		RAINY LAKE NEAR BEAR PASS	48.70058	-92.95800
05PB025	True		RAINY LAKE AT STOKES BAY	48.53611	-92.56111
05PC009		True	LA VALLEE RIVER AT LA VALLEE	48.62083	-93.62500
05PC010		True	STURGEON RIVER NEAR BARWICK	48.68750	-93.98333
05PC016		True	LA VALLEE RIVER NEAR DEVLIN	48.59028	-93.67278
05PC018	True	True	RAINY RIVER AT MANITOU RAPIDS	48.63447	-93.91336
05PC019		True	RAINY RIVER AT FORT FRANCES	48.60853	-93.40344
05PC022	True	True	LA VALLEE RIVER NEAR BURRISS	48.67844	-93.66522
05PC024	True		RAINY RIVER AT PITHERS POINT SITE NO.1	48.61389	-93.35472
05PC025	True		RAINY RIVER AT PITHERS POINT SITE NO.2	48.61625	-93.35992

Pickle Level Data to WSC_LEVELS¶

In [4]:

def getLevelsWSC(s):
    '''Return a time series for levels at a given station.'''
    ts = {}   
    data = DLY_LEVELS[DLY_LEVELS['STATION_NUMBER'] == s]
    for k in data.index:
        mo = str(data.ix[k,'MONTH'])
        yr = str(data.ix[k,'YEAR'])
        for n in range(1,data.ix[k,'NO_DAYS']+1):
            ts[pd.to_datetime(mo+'/'+str(n)+'/'+yr)] = data.ix[k,'LEVEL'+str(n)]  
    ts = pd.Series(ts)
    #drop initial and terminal null entries
    j = 0
    while j<len(ts.index) and pd.isnull(ts.ix[j]):
        j += 1
    k = len(ts.index) - 1
    while k>=j and pd.isnull(ts.ix[k]):
        k += -1
    return ts[j:k+1]

WSC_LEVELS = pd.DataFrame({s:getLevelsWSC(s) for s in WSC_STATIONS[WSC_STATIONS['Level']==True].index})
WSC_LEVELS.to_pickle('../data/WSC_LEVELS')
WSC_LEVELS.tail()

Out[4]:

	05PA003	05PA005	05PA006	05PA007	05PA010	05PA011	05PA012	05PA013	05PB002	05PB007	05PB012	05PB014	05PB018	05PB023	05PB024	05PB025	05PC018	05PC022	05PC024	05PC025
2014-12-27	NaN	NaN	360.451	NaN	NaN	NaN	NaN	339.987	NaN	NaN	NaN	0.915	386.296	8.433	337.361	NaN	NaN	10.775	337.325	337.324
2014-12-28	NaN	NaN	360.449	NaN	NaN	NaN	NaN	339.981	NaN	NaN	NaN	0.913	386.287	8.426	337.356	NaN	NaN	10.779	337.318	337.317
2014-12-29	NaN	NaN	360.447	NaN	NaN	NaN	NaN	339.975	NaN	NaN	NaN	0.910	386.295	8.421	337.351	NaN	NaN	10.788	337.313	337.312
2014-12-30	NaN	NaN	360.445	NaN	NaN	NaN	NaN	339.968	NaN	NaN	NaN	0.907	386.297	8.418	337.346	NaN	NaN	10.796	337.308	337.307
2014-12-31	NaN	NaN	360.444	NaN	NaN	NaN	NaN	339.962	NaN	NaN	NaN	0.904	386.303	8.418	337.341	NaN	NaN	10.795	337.303	337.301

Pickle Flow Data to WSC_FLOWS¶

The DLY_FLOW data is encoded in an irregular tabular format where rows are indexed by station code, year, and month, and columns are indexed by date. Given a station code, the following function decodes DLY_FLOW to produce a pandas times series of flow rates. The function is used to create a pandas dataframe for all flow stations in WSC_STATIONS, then pickles the results to a data file for use by other notebooks in this repository.

In [5]:

def getFlowsWSC(s):
    ts = {}
    data = DLY_FLOWS[DLY_FLOWS['STATION_NUMBER'] == s]
    for k in data.index:
        mo = str(data.ix[k,'MONTH'])
        yr = str(data.ix[k,'YEAR'])
        for n in range(1,data.ix[k,'NO_DAYS']+1):
            ts[pd.to_datetime(mo+'/'+str(n)+'/'+yr)] = data.ix[k,'FLOW'+str(n)]  
    ts = pd.Series(ts)
    ts.name = s +': ' + STATIONS.ix[s,'STATION_NAME'] + \
        ' from ' + '{0}'.format(ts.index[0].year) + \
        ' to ' + '{0}'.format(ts.index[-1].year)
    #drop initial and terminal null entries
    j = 0
    while j<len(ts.index) and pd.isnull(ts.ix[j]):
        j += 1
    k = len(ts.index) - 1
    while k>=j and pd.isnull(ts.ix[k]):
        k += -1
    return ts[j:k+1]

WSC_FLOWS = pd.DataFrame({s: getFlowsWSC(s) for s in WSC_STATIONS[WSC_STATIONS['Flow']==True].index})
WSC_FLOWS.to_pickle('../data/WSC_FLOWS')
WSC_FLOWS.tail()

Out[5]:

	05PA001	05PA006	05PA012	05PB001	05PB003	05PB004	05PB009	05PB014	05PB015	05PB018	05PB019	05PB020	05PB021	05PB022	05PC009	05PC010	05PC016	05PC018	05PC019	05PC022
2014-12-27	NaN	36.7	NaN	NaN	NaN	NaN	36.9	22.5	NaN	1.27	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	170.0	0.027
2014-12-28	NaN	36.6	NaN	NaN	NaN	NaN	37.4	22.4	NaN	1.26	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	170.0	0.026
2014-12-29	NaN	36.5	NaN	NaN	NaN	NaN	37.8	22.2	NaN	1.25	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	169.0	0.026
2014-12-30	NaN	36.4	NaN	NaN	NaN	NaN	37.1	22.0	NaN	1.24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	169.0	0.026
2014-12-31	NaN	36.3	NaN	NaN	NaN	NaN	38.0	21.9	NaN	1.24	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	169.0	0.026