We are going to use the newly crafted OOI Machine-2-Machine (M2M) interface to pull in the data for the same time period (June 2016) that we looked at in the read_raw_spkir notebook. In order for this to work there is bunch of info you'll need, outlined via tutorials on the OOI Community Tools website. In particular, see the example tutorial for accessing NetCDF data.
import datetime
import json
import netrc
import requests
import time
import xarray as xr
from bokeh.models import Range1d, LinearAxis
from bokeh.plotting import figure, show
from bokeh.palettes import Colorblind as palette
from bokeh.io import output_notebook
import warnings
warnings.filterwarnings('ignore')
As noted in the tutorial, you need access credentials to download the data. I use netrc files stored these credentials. These are normally stored in your home directory, but you can store them anywhere and then pass that path info into the netrc object when you initialize it.
For accessing the OOI M2M data portal, use your API username and token that can be found by clicking on your username in the Data Portal, and selecting "User Profile". Then put the following info into your netrc file:
ooinet.oceanobservatories.org
login <API Username>
password <API Token>
netrc = netrc.netrc('C:\\ooi\\ooinet.txt') # explicitly setting this on the windows machine
auth = netrc.authenticators('ooinet.oceanobservatories.org')
# Based on the reference designator for the Oregon Shelf Surface Mooring, Near Surface Instrument Frame,
# Spectral Irradiance (CE02SHSM-RID26-08-SPKIRB000), we can begin to construct the required URL for
# the data request. We alse need the delivery method (telemetered) and the data stream name, which is
# listed on the Data Portal as spkir_abj_dcl_instrument.
DATA_API_BASE_URL = 'https://ooinet.oceanobservatories.org/api/m2m/12576/sensor/inv/'
data_request_url = (DATA_API_BASE_URL +
'CE02SHSM/' + # Site designator
'RID26/' + # Node designator
'08-SPKIRB000/' + # Instrument designator
'telemetered/' + # Data delivery method
'spkir_abj_dcl_instrument' + '?' + # Data stream name
'beginDT=2016-06-01T00:00:00.000Z&' + # Beginning time range
'endDT=2016-06-30T23:59:59.999Z&' + # Ending time range
'format=application/netcdf') # Specifying we want NetCDF data files
Note, the tutorial does not include the format portion of the URL. You need this if you are requesting NetCDF files.
# Put the request in for the data.
r = requests.get(data_request_url, auth=(auth[0], auth[2]))
data = r.json()
print(json.dumps(data, indent=2))
{ "outputURL": "https://opendap.oceanobservatories.org/thredds/catalog/ooi/ooice.platforms@gmail.com/20180201T025052-CE02SHSM-RID26-08-SPKIRB000-telemetered-spkir_abj_dcl_instrument/catalog.html", "allURLs": [ "https://opendap.oceanobservatories.org/thredds/catalog/ooi/ooice.platforms@gmail.com/20180201T025052-CE02SHSM-RID26-08-SPKIRB000-telemetered-spkir_abj_dcl_instrument/catalog.html", "https://opendap.oceanobservatories.org/async_results/ooice.platforms@gmail.com/20180201T025052-CE02SHSM-RID26-08-SPKIRB000-telemetered-spkir_abj_dcl_instrument" ], "sizeCalculation": 51532278, "requestUUID": "ac17487b-5452-4504-8dce-b382717927ab", "numberOfSubJobs": 3, "timeCalculation": 211 }
Some requests may take awhile to process. The tutorial provides example code for automating a check to see if the request is completed. I've copied that code below.
The ultimate result of the request is a link to the OOI Thredds Data Server, from which you need to find the NetCDF file(s) that are the result of your request. The functions below the check will help you find those files, download them, and (like with the example for reading the raw SPKIR data) apply a median average to the burst data (collected at ~1 Hz for 3 minutes every 15 minutes) creating a cleaned up and simplified dataframe that we can use for further work.
%%time
check_complete = data['allURLs'][1] + '/status.txt' # When SOA is actually not that efficient...
for i in range(1000):
r = requests.get(check_complete)
if r.status_code == requests.codes.ok:
print('request completed')
break
else:
time.sleep(.5)
request completed Wall time: 1min 1s
# Add some addition modules
from bs4 import BeautifulSoup
import re
# Function to create a list of the data files of interest
def list_files(url, tag=''):
page = requests.get(url).text
soup = BeautifulSoup(page, 'html.parser')
pattern = re.compile(str(tag))
return [node.get('href') for node in soup.find_all('a', text=pattern)]
# Function to download one of the NetCDF files, parse it, apply median-averaging to the bursts and create a final dataframe.
def process_file(file):
# download and convert the data into a pandas dataframe
baseurl = 'https://opendap.oceanobservatories.org/thredds/dodsC/'
url = re.sub('catalog.html\?dataset=', baseurl, file)
ds = xr.open_dataset(url).load()
ds = ds.swap_dims({'obs': 'time'})
ds = ds.drop(['obs', 'id', 'dcl_controller_timestamp', 'driver_timestamp', 'ingestion_timestamp',
'internal_timestamp', 'port_timestamp', 'preferred_timestamp','provenance', 'passed_checksum',
'quality_flag', 'instrument_id'])
ds = ds.rename({'channel_array': 'raw_channels', 'spkir_abj_cspp_downwelling_vector': 'spectral_irradiance'}, inplace=True)
# Set the burst index and create the median averaged burst dataset
bursts = ds.resample(time='15Min').median()
return bursts
# Create a list of the files from June using a simple regex as tag to discriminate the files
files = list_files(data['allURLs'][0], '.*SPKIR.*\.nc$')
# Process the data files for June and concatenate into a single dataframe
frames = [process_file(f) for f in files]
june = xr.concat(frames, 'time')
june.coords['spectra'] = [412, 444, 490, 510, 555, 620, 683]
june
<xarray.Dataset> Dimensions: (spectra: 7, time: 2880) Coordinates: * time (time) datetime64[ns] 2016-06-01 ... * spectra (spectra) int32 412 444 490 510 555 620 683 Data variables: deployment (time) float64 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 ... raw_channels (time, spectra) float64 2.159e+09 2.163e+09 ... frame_counter (time) float64 87.5 88.0 85.0 86.0 87.5 87.5 84.5 ... internal_temperature (time) float64 129.0 129.0 129.0 129.0 129.0 129.0 ... sample_delay (time) float64 -133.0 -133.0 -133.0 -133.0 -133.0 ... timer (time) float64 92.75 93.23 90.35 91.31 92.75 92.75 ... va_sense (time) float64 177.5 178.0 177.0 178.0 178.0 178.0 ... vin_sense (time) float64 280.0 279.0 280.0 279.0 280.0 280.0 ... spectral_irradiance (time, spectra) float64 3.234 4.58 9.184 10.8 ...
# Plot the data for the month
output_notebook()
# make a list of our columns
cols = ['412 nm', '444 nm', '490 nm', '510 nm', '555 nm', '620 nm', '683 nm']
colors = palette[7]
# make the figure,
p = figure(x_axis_type="datetime", title="Spectral Irradiance at 7 m -- June 2016", width = 850, height = 500)
p.xaxis.axis_label = 'Date'
p.yaxis.axis_label = 'Irradiance [uW/cm^-2/nm^-1]'
p.y_range = Range1d(start=0, end=100)
for i in range(7):
p.line(june.time.values, june.spectral_irradiance[:, i].values, color=colors[i], legend=cols[i])
p.toolbar_location = 'above'
show(p)
# Plot the optical absorption data for the month (just show every 4 hours)
output_notebook()
# make the figure,
p = figure(title="Spectral Irradiance -- June 2016", width = 850, height = 500)
p.xaxis.axis_label = 'Wavelength (nm)'
p.x_range = Range1d(start=400, end=700)
p.yaxis.axis_label = 'Irradiance [uW/cm^-2/nm^-1]'
p.y_range = Range1d(start=0, end=80.0)
for i in range(0, len(june.time), 10):
p.line(june.spectra.values, june.spectral_irradiance[i, :].values)
p.toolbar_location = 'above'
show(p)
june['time'] = june.time.values.astype(float) / 10.0**9 # Convert from datetime object in nanoseconds to seconds since 1970
june.to_netcdf('C:\\ooi\\ce02shsm_june2016_ooinet_spkir.nc')