Notebook

Accessing Three Channel (FLORT) Fluorescence and Backscatter Data from the OOI M2M Interface¶

We are going to use the newly crafted OOI Machine-2-Machine (M2M) interface to pull in the data for the same time period (June 2016) that we looked at in the read_raw_flort notebook. In order for this to work there is bunch of info you'll need, outlined via tutorials on the OOI Community Tools website. In particular, see the example tutorial for accessing NetCDF data.

In [1]:

import datetime
import json
import netrc
import requests
import time

import xarray as xr

from bokeh.models import Range1d, LinearAxis
from bokeh.plotting import figure, show
from bokeh.palettes import Colorblind as palette
from bokeh.io import output_notebook

import warnings
warnings.filterwarnings('ignore')

As noted in the tutorial, you need access credentials to download the data. I use netrc files stored these credentials. These are normally stored in your home directory, but you can store them anywhere and then pass that path info into the netrc object when you initialize it. There is more information available online about the use of netrc files on Windows, MacOS and Linux systems.

For accessing the OOI M2M data portal, use your API username and token that can be found by clicking on your username in the Data Portal, and selecting "User Profile". Then put the following info into your netrc file:

ooinet.oceanobservatories.org
    login <API Username>
    password <API Token>

In [2]:

netrc = netrc.netrc('C:\\ooi\\ooinet.txt')  # explicitly setting this on the windows machine
auth = netrc.authenticators('ooinet.oceanobservatories.org')

In [3]:

# Based on the reference designator for the Oregon Shelf Surface Mooring, Near Surface Instrument Frame, 
# 3-Wavelength Fluorometer (CE02SHSM-RID27-02-FLORTD000), we can begin to construct the required URL for
# the data request. We alse need the delivery method (telemetered) and the data stream name, which is 
# listed on the Data Portal as flort_sample.
DATA_API_BASE_URL = 'https://ooinet.oceanobservatories.org/api/m2m/12576/sensor/inv/'
data_request_url = (DATA_API_BASE_URL + 
                    'CE02SHSM/' +                          # Site designator
                    'RID27/' +                             # Node designator
                    '02-FLORTD000/' +                      # Instrument designator
                    'telemetered/' +                       # Data delivery method
                    'flort_sample' + '?' +                 # Data stream name
                    'beginDT=2016-06-01T00:00:00.000Z&' +  # Beginning time range
                    'endDT=2016-06-30T23:59:59.999Z&' +    # Ending time range
                    'format=application/netcdf')           # Specifying we want NetCDF data files

Note, the tutorial linked above does not include the format portion of the URL. You need this if you are requesting NetCDF files.

In [4]:

# Put the request in for the data (this will generate an email to the account you used to sign in with).
r = requests.get(data_request_url, auth=(auth[0], auth[2]))
data = r.json()

In [5]:

print(json.dumps(data, indent=2))

{
  "sizeCalculation": 427107000,
  "requestUUID": "f68409bf-4993-4a64-89ac-9412505ec894",
  "outputURL": "https://opendap.oceanobservatories.org/thredds/catalog/ooi/ooice.platforms@gmail.com/20180201T025109-CE02SHSM-RID27-02-FLORTD000-telemetered-flort_sample/catalog.html",
  "numberOfSubJobs": 15,
  "timeCalculation": 1751,
  "allURLs": [
    "https://opendap.oceanobservatories.org/thredds/catalog/ooi/ooice.platforms@gmail.com/20180201T025109-CE02SHSM-RID27-02-FLORTD000-telemetered-flort_sample/catalog.html",
    "https://opendap.oceanobservatories.org/async_results/ooice.platforms@gmail.com/20180201T025109-CE02SHSM-RID27-02-FLORTD000-telemetered-flort_sample"
  ]
}

Some requests may take awhile to process. The tutorial provides example code for automating a check to see if the request is completed. I've copied that code below.

The ultimate result of the request is a link to the OOI Thredds Data Server, from which you need to find the NetCDF file(s) that are the result of your request. The functions below the check will help you find those files, download them, and (like with the example for reading the raw FLORT data) apply a median average to the burst data (collected at ~1 Hz for 3 minutes every 15 minutes) creating a cleaned up and simplified dataset that we can use for further work.

In [6]:

%%time

check_complete = data['allURLs'][1] + '/status.txt'  # When SOA is actually not that efficient...
for i in range(1000): 
    r = requests.get(check_complete)
    if r.status_code == requests.codes.ok:
        print('request completed')
        break
    else:
        time.sleep(.5)

request completed
Wall time: 44.1 s

In [7]:

# Add some addition modules
from bs4 import BeautifulSoup
import re

# Function to create a list of the data files of interest
def list_files(url, tag=''):
    page = requests.get(url).text
    soup = BeautifulSoup(page, 'html.parser')
    pattern = re.compile(str(tag))
    return [node.get('href') for node in soup.find_all('a', text=pattern)]

# Function to download one of the NetCDF files, convert to a dataset, apply median-averaging to the bursts 
# and create a final dataset.
def process_file(file):
    # download and convert the data
    baseurl = 'https://opendap.oceanobservatories.org/thredds/dodsC/'
    url = re.sub('catalog.html\?dataset=', baseurl, file)
    ds = xr.open_dataset(url).load()  # download the data first, before converting
    ds = ds.swap_dims({'obs': 'time'})
    ds = ds.drop(['obs', 'id', 'driver_timestamp', 'ingestion_timestamp', 'internal_timestamp', 'pressure_depth', 
                  'port_timestamp', 'preferred_timestamp', 'provenance', 'quality_flag', 'suspect_timestamp'])
    
    # resample the burst data data into median averaged bursts
    bursts = ds.resample(time='15Min').median()
    return bursts

In [8]:

# Create a list of the files from June using a simple regex as tag to discriminate the files
files = list_files(data['allURLs'][0], '.*FLORT.*\.nc$')

# Process the data files for June and concatenate into a single dataframe
frames = [process_file(f) for f in files]
june = xr.concat(frames, 'time')

In [9]:

june

Out[9]:

<xarray.Dataset>
Dimensions:                              (time: 2880)
Coordinates:
  * time                                 (time) datetime64[ns] 2016-06-01 ...
Data variables:
    deployment                           (time) float64 3.0 3.0 3.0 3.0 3.0 ...
    measurement_wavelength_beta          (time) float64 700.0 700.0 700.0 ...
    measurement_wavelength_cdom          (time) float64 460.0 460.0 460.0 ...
    measurement_wavelength_chl           (time) float64 695.0 695.0 695.0 ...
    raw_internal_temp                    (time) float64 553.0 553.0 553.0 ...
    raw_signal_beta                      (time) float64 580.0 500.5 590.0 ...
    raw_signal_cdom                      (time) float64 71.0 70.0 70.0 68.0 ...
    raw_signal_chl                       (time) float64 972.0 879.0 ...
    fluorometric_chlorophyll_a           (time) float64 6.489 5.838 6.804 ...
    fluorometric_cdom                    (time) float64 2.109 2.028 2.028 ...
    total_volume_scattering_coefficient  (time) float64 0.001026 0.0008717 ...
    temp                                 (time) float64 11.13 10.99 11.05 ...
    practical_salinity                   (time) float64 32.15 32.17 32.2 ...
    seawater_scattering_coefficient      (time) float64 0.0006266 0.0006269 ...
    optical_backscatter                  (time) float64 0.006915 0.005875 ...

In [10]:

# Provide a simple plot of a days worth of data
output_notebook()

# make a list of our columns
cols = ['fluorometric_chlorophyll_a', 'optical_backscatter']
colors = palette[3]

# make the figure, 
p = figure(x_axis_type="datetime", title="Chlorophyll and Backscatter -- June 2016", width = 850, height = 500)
p.xaxis.axis_label = 'Date'

p.yaxis.axis_label = 'Estimated Chlorophyll [mg/L]'
p.y_range = Range1d(start=0, end=15)

p.extra_y_ranges['bback'] = Range1d(start=0, end=0.025)
p.add_layout(LinearAxis(y_range_name='bback', axis_label='Particulate Backscatter [m^-1]'), 'right')

p.line(june.time.values, june[cols[0]].values / 2.0, color=colors[0], legend=cols[0])
p.line(june.time.values, june[cols[1]].values, color=colors[1], legend=cols[1], y_range_name = 'bback')

p.toolbar_location = 'above'
show(p)

Loading BokehJS ...

In [11]:

# make a list of our columns
cols = ['fluorometric_cdom', 'temp', 'practical_salinity']
colors = palette[3]

# make the figure, 
p = figure(x_axis_type="datetime", title="CDOM with Co-Located Temperature and Salinity", width = 850, height = 500)
p.xaxis.axis_label = 'Date'

p.yaxis.axis_label = 'Fluorometric CDOM [ppb]'
p.y_range = Range1d(start=0, end=5)

p.extra_y_ranges['temp'] = Range1d(start=9, end=18)
p.add_layout(LinearAxis(y_range_name='temp', axis_label='Temperature [degC]'), 'right')

p.extra_y_ranges['psu'] = Range1d(start=30, end=34)
p.add_layout(LinearAxis(y_range_name='psu', axis_label='Salinity [psu]'), 'right')

p.line(june.time.values, june[cols[0]].values, color=colors[0], legend='cdom')
p.line(june.time.values, june[cols[1]].values, color=colors[1], legend='degC', y_range_name = 'temp')
p.line(june.time.values, june[cols[2]].values, color=colors[2], legend='psu', y_range_name = 'psu')

p.toolbar_location = 'above'
show(p)

At this point, you have the option to save the data, or apply the processing routines available in pyseas and cgsn_processing, to convert the data from raw engineering units to scientific units using the calibration coefficients that are available online. An example for how those steps work is available here

In [12]:

june['time'] = june.time.values.astype(float) / 10.0**9  # Convert from datetime object in nanoseconds to seconds since 1970
june.to_netcdf('C:\\ooi\\ce02shsm_june2016_ooinet_flort.nc')