Exploring ONC Data Web Services

Exploring the ONC data web services documented at https://wiki.oceannetworks.ca/display/help/API

In [1]:
import json
import os
from urllib.parse import (
    quote,
    urlencode,
)

import arrow
import numpy as np
import requests
import xarray as xr

Web Service Requests

The web service URLS are composed of:

  • the base URL: http://dmas.uvic.ca/api/
  • an end-point: presently one of archivefiles, dataproducts, rawdata, scalardata, stations, or status
  • a query string that includes method=methodName, token=USER_TOKEN, and other end-point-specific key/value pairs

For example:

http://dmas.uvic.ca/api/scalardata?method=getByStation&token=<yourValidToken>&station=SCHDW.O1&deviceCategory=OXYSENSOR

Access to the web services requires a user token which you can generate on the Web Services API tab of your ONC account profile page. I have stored mine in an environment variable so as not to publish it to the world in this notebook.

In [3]:
TOKEN = os.environ['ONC_USER_TOKEN']

Here's a string template for a request URL, and a technique for composing the query string:

In [4]:
url_tmpl = 'http://dmas.uvic.ca/api/{endpoint}?{query}'
In [5]:
url_tmpl.format(
    endpoint='scalardata',
    query=urlencode({
        'method': 'getByStation',
        'token': 'USER_TOKEN',
        'station': 'SCHDW.O1',
        'deviceCategory': 'OXYSENSOR',
    }, quote_via=quote, safe='/:'))
Out[5]:
'http://dmas.uvic.ca/api/scalardata?method=getByStation&token=USER_TOKEN&station=SCHDW.O1&deviceCategory=OXYSENSOR'

Substituting my real token, and using the requests package to send the data request to the web service:

In [6]:
data_url = url_tmpl.format(
    endpoint='scalardata',
    query=urlencode({
        'method': 'getByStation',
        'token': TOKEN,
        'station': 'SCHDW.O1',
        'deviceCategory': 'OXYSENSOR',
    }, quote_via=quote, safe='/:'))

response = requests.get(data_url)
response.raise_for_status()

Calling the raise_for_status() method on the response object is a quick way to test for HTTP errors.

The default response type from the web services is JSON and requests provides a convenience method to convert JSON in the response to a Python dict object:

In [7]:
response.json()
Out[7]:
{'sensorData': [{'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-05-02T20:13:10.064Z',
     'value': 2.093462278}],
   'sensor': 'oxygen_corrected',
   'sensorName': 'Oxygen Concentration Corrected',
   'unitOfMeasure': 'ml/l'},
  {'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-05-02T20:13:10.064Z',
     'value': 9.9706}],
   'sensor': 'temperature',
   'sensorName': 'Temperature',
   'unitOfMeasure': 'C'}],
 'serviceMetadata': {'dateFrom': None,
  'dateTo': None,
  'deviceCategory': 'OXYSENSOR',
  'nextDateFrom': None,
  'outputFormat': None,
  'rowLimit': None,
  'sensors': None,
  'station': 'SCHDW.O1',
  'totalActualSamples': 2}}

So, let's put that all together into a function to query the ONC data web services:

In [23]:
def get_onc_data(endpoint, method, token, **query_params):
    url_tmpl = 'http://dmas.uvic.ca/api/{endpoint}?{query}'
    query = {'method': method, 'token': token}
    query.update(query_params)
    data_url = url_tmpl.format(
        endpoint=endpoint,
        query=urlencode(query, quote_via=quote, safe='/:'))
    response = requests.get(data_url)
    response.raise_for_status()
    return response.json()
In [24]:
get_onc_data('scalardata', 'getByStation', TOKEN, station='SCHDW.O1', deviceCategory='OXYSENSOR')
Out[24]:
{'sensorData': [{'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-05-02T20:13:10.064Z',
     'value': 2.093462278}],
   'sensor': 'oxygen',
   'sensorName': 'Oxygen',
   'unitOfMeasure': 'ml/l'},
  {'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-05-02T20:13:10.064Z',
     'value': 9.9706}],
   'sensor': 'temperature',
   'sensorName': 'Temperature',
   'unitOfMeasure': 'C'}],
 'serviceMetadata': {'dateFrom': None,
  'dateTo': None,
  'deviceCategory': 'OXYSENSOR',
  'nextDateFrom': None,
  'outputFormat': None,
  'rowLimit': None,
  'sensors': None,
  'station': 'SCHDW.O1',
  'totalActualSamples': 2}}

stations End-point

The stations end-point has a getTree method that returns a large, hierarchical tree data structure of stations, station codes, and devices.

get_onc_data('stations', 'getTree', TOKEN)

Rather than reading or parsing the data structure here, it is perhaps easier to find station codes and device categories to use with the other end-points by looking at the list generated by the Javascript Usage Exmaple provided by ONC.

scalardata End-point

The scalardata end-point has a getByStation method that returns time series of data given a station code and an device category code. Its simplest use case is to return the most recent data for all sensors associated with the device category at the station.

VENUS Node CTD Data

Here that is for the CTD at the Salish Sea Central node VENUS Instrument Platform:

In [25]:
get_onc_data('scalardata', 'getByStation', TOKEN, station='USDDL', deviceCategory='CTD')
Out[25]:
{'sensorData': [{'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-07-11T19:03:06.364Z',
     'value': 113.16}],
   'sensor': 'Depth',
   'sensorName': 'Depth',
   'unitOfMeasure': 'm'},
  {'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-07-11T19:03:06.364Z',
     'value': 23.261575019}],
   'sensor': 'SIGMA_THETA',
   'sensorName': 'Sigma-theta (0 dbar)',
   'unitOfMeasure': 'kg/m3'},
  {'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-07-11T19:03:06.364Z',
     'value': 3.35354}],
   'sensor': 'cond',
   'sensorName': 'Conductivity',
   'unitOfMeasure': 'S/m'},
  {'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-07-11T19:03:06.364Z',
     'value': 1023.779048283}],
   'sensor': 'density',
   'sensorName': 'Density',
   'unitOfMeasure': 'kg/m3'},
  {'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-07-11T19:03:06.364Z',
     'value': 114.157}],
   'sensor': 'pressure',
   'sensorName': 'Pressure',
   'unitOfMeasure': 'decibar'},
  {'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-07-11T19:03:06.364Z',
     'value': 30.281}],
   'sensor': 'salinity',
   'sensorName': 'Practical Salinity',
   'unitOfMeasure': 'psu'},
  {'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-07-11T19:03:06.364Z',
     'value': 23.259561821}],
   'sensor': 'sigmaT',
   'sensorName': 'Sigma-t',
   'unitOfMeasure': 'kg/m3'},
  {'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-07-11T19:03:06.364Z',
     'value': 1486.198}],
   'sensor': 'sound_speed',
   'sensorName': 'Sound Speed',
   'unitOfMeasure': 'm/s'},
  {'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-07-11T19:03:06.364Z',
     'value': 10.0633}],
   'sensor': 'temperature',
   'sensorName': 'Temperature',
   'unitOfMeasure': 'C'}],
 'serviceMetadata': {'dateFrom': None,
  'dateTo': None,
  'deviceCategory': 'CTD',
  'nextDateFrom': None,
  'outputFormat': None,
  'rowLimit': None,
  'sensors': None,
  'station': 'USDDL',
  'totalActualSamples': 9}}

Adding a sensors item to the query with a value that is a comma-separated list of sensor codes limits the response to contain only the data from the specified sensors:

In [26]:
get_onc_data(
    'scalardata', 'getByStation', TOKEN, station='SCVIP', deviceCategory='CTD', 
    sensors='salinity,temperature')
Out[26]:
{'sensorData': [{'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-07-11T19:03:14.582Z',
     'value': 30.8947}],
   'sensor': 'salinity',
   'sensorName': 'Practical Salinity',
   'unitOfMeasure': 'psu'},
  {'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-07-11T19:03:14.582Z',
     'value': 9.5234}],
   'sensor': 'temperature',
   'sensorName': 'Temperature',
   'unitOfMeasure': 'C'}],
 'serviceMetadata': {'dateFrom': None,
  'dateTo': None,
  'deviceCategory': 'CTD',
  'nextDateFrom': None,
  'outputFormat': None,
  'rowLimit': None,
  'sensors': 'salinity,temperature',
  'station': 'SCVIP',
  'totalActualSamples': 2}}

Time series of data are obtained by adding dateFrom and dateTo items to the query:

In [31]:
get_onc_data(
    'scalardata', 'getByStation', TOKEN, station='SCVIP', deviceCategory='CTD',
    sensors='salinity,temperature',
    dateFrom='2016-06-28T00:26:45.000Z',
)
Out[31]:
{'sensorData': [{'actualSamples': 2,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-06-28T00:26:45.895Z',
     'value': 30.9153},
    {'qaqcFlag': 1,
     'sampleTime': '2016-06-28T00:26:46.895Z',
     'value': 30.9154}],
   'sensor': 'salinity',
   'sensorName': 'Practical Salinity',
   'unitOfMeasure': 'psu'},
  {'actualSamples': 2,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-06-28T00:26:45.895Z',
     'value': 9.563},
    {'qaqcFlag': 1,
     'sampleTime': '2016-06-28T00:26:46.895Z',
     'value': 9.5627}],
   'sensor': 'temperature',
   'sensorName': 'Temperature',
   'unitOfMeasure': 'C'}],
 'serviceMetadata': {'dateFrom': '2016-06-28T00:26:45.000Z',
  'dateTo': None,
  'deviceCategory': 'CTD',
  'nextDateFrom': None,
  'outputFormat': None,
  'rowLimit': None,
  'sensors': 'salinity,temperature',
  'station': 'SCVIP',
  'totalActualSamples': 4}}

With only a dateFrom item in the query the time series length defaults to 1 day in length. The dateTo query item controls the length of the time series by date/time stamp.

In [27]:
get_onc_data(
    'scalardata', 'getByStation', TOKEN, station='SCVIP', deviceCategory='CTD',
    sensors='salinity,temperature',
    dateFrom='2016-06-21T17:58:45.000Z', dateTo='2016-06-21T17:58:50.000Z',
)
Out[27]:
{'sensorData': [{'actualSamples': 5,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-06-21T17:58:45.400Z',
     'value': 31.1297},
    {'qaqcFlag': 1,
     'sampleTime': '2016-06-21T17:58:46.440Z',
     'value': 31.1294},
    {'qaqcFlag': 1,
     'sampleTime': '2016-06-21T17:58:47.400Z',
     'value': 31.1285},
    {'qaqcFlag': 1,
     'sampleTime': '2016-06-21T17:58:48.444Z',
     'value': 31.1285},
    {'qaqcFlag': 1,
     'sampleTime': '2016-06-21T17:58:49.400Z',
     'value': 31.1282}],
   'sensor': 'salinity',
   'sensorName': 'Practical Salinity',
   'unitOfMeasure': 'psu'},
  {'actualSamples': 5,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-06-21T17:58:45.400Z',
     'value': 9.3081},
    {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:46.440Z', 'value': 9.3083},
    {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:47.400Z', 'value': 9.3087},
    {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:48.444Z', 'value': 9.3088},
    {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:49.400Z', 'value': 9.309}],
   'sensor': 'temperature',
   'sensorName': 'Temperature',
   'unitOfMeasure': 'C'}],
 'serviceMetadata': {'dateFrom': '2016-06-21T17:58:45.000Z',
  'dateTo': '2016-06-21T17:58:50.000Z',
  'deviceCategory': 'CTD',
  'nextDateFrom': '2016-06-21T17:58:50.444Z',
  'outputFormat': None,
  'rowLimit': None,
  'sensors': 'salinity,temperature',
  'station': 'SCVIP',
  'totalActualSamples': 10}}

The number of measurements returned can be specified directly with the rowLimit query item. Note also that there is a hard limit of 100,000 measurements per sensor per request.

In [28]:
get_onc_data(
    'scalardata', 'getByStation', TOKEN, station='SCVIP', deviceCategory='CTD',
    sensors='salinity,temperature',
    dateFrom='2016-06-21T17:58:45.000Z', rowLimit=2,
)
Out[28]:
{'sensorData': [{'actualSamples': 2,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-06-21T17:58:45.400Z',
     'value': 31.1297},
    {'qaqcFlag': 1,
     'sampleTime': '2016-06-21T17:58:46.440Z',
     'value': 31.1294}],
   'sensor': 'salinity',
   'sensorName': 'Practical Salinity',
   'unitOfMeasure': 'psu'},
  {'actualSamples': 2,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-06-21T17:58:45.400Z',
     'value': 9.3081},
    {'qaqcFlag': 1,
     'sampleTime': '2016-06-21T17:58:46.440Z',
     'value': 9.3083}],
   'sensor': 'temperature',
   'sensorName': 'Temperature',
   'unitOfMeasure': 'C'}],
 'serviceMetadata': {'dateFrom': '2016-06-21T17:58:45.000Z',
  'dateTo': None,
  'deviceCategory': 'CTD',
  'nextDateFrom': '2016-06-21T17:58:47.400Z',
  'outputFormat': None,
  'rowLimit': 2,
  'sensors': 'salinity,temperature',
  'station': 'SCVIP',
  'totalActualSamples': 4}}

The values for dateFrom and dateTo are in UTC and must be strings formatted as yyyy-MM-ddTHH:mm:ss.SSSZ. That format is annoying enough to type, and timezone conversions are error-prone enough that it is worth writing a function to handle the details:

In [29]:
def onc_datetime(datetime_str, timezone='Canada/Pacific'):
    d = arrow.get(datetime_str)
    d_tz = arrow.get(d.datetime, timezone)
    d_utc = d_tz.to('utc')
    return '{}Z'.format(d_utc.format('YYYY-MM-DDTHH:mm:ss.SSS'))

The onc_datetime() function has been added to the salishssea_tools.data_tools module.

In [30]:
onc_datetime('2016-06-21 10:58:45')
Out[30]:
'2016-06-21T17:58:45.000Z'
In [31]:
onc_datetime('2016-06-21 17:58:45', 'utc')
Out[31]:
'2016-06-21T17:58:45.000Z'
In [32]:
get_onc_data(
    'scalardata', 'getByStation', TOKEN, station='SCVIP', deviceCategory='CTD',
    sensors='salinity,temperature',
    dateFrom=onc_datetime('2016-06-21 10:58:45'), dateTo=onc_datetime('2016-06-21 10:58:50'),
)
Out[32]:
{'sensorData': [{'actualSamples': 5,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-06-21T17:58:45.400Z',
     'value': 31.1297},
    {'qaqcFlag': 1,
     'sampleTime': '2016-06-21T17:58:46.440Z',
     'value': 31.1294},
    {'qaqcFlag': 1,
     'sampleTime': '2016-06-21T17:58:47.400Z',
     'value': 31.1285},
    {'qaqcFlag': 1,
     'sampleTime': '2016-06-21T17:58:48.444Z',
     'value': 31.1285},
    {'qaqcFlag': 1,
     'sampleTime': '2016-06-21T17:58:49.400Z',
     'value': 31.1282}],
   'sensor': 'salinity',
   'sensorName': 'Practical Salinity',
   'unitOfMeasure': 'psu'},
  {'actualSamples': 5,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-06-21T17:58:45.400Z',
     'value': 9.3081},
    {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:46.440Z', 'value': 9.3083},
    {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:47.400Z', 'value': 9.3087},
    {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:48.444Z', 'value': 9.3088},
    {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:49.400Z', 'value': 9.309}],
   'sensor': 'temperature',
   'sensorName': 'Temperature',
   'unitOfMeasure': 'C'}],
 'serviceMetadata': {'dateFrom': '2016-06-21T17:58:45.000Z',
  'dateTo': '2016-06-21T17:58:50.000Z',
  'deviceCategory': 'CTD',
  'nextDateFrom': '2016-06-21T17:58:50.444Z',
  'outputFormat': None,
  'rowLimit': None,
  'sensors': 'salinity,temperature',
  'station': 'SCVIP',
  'totalActualSamples': 10}}

Ferry Temperature and Salinity Data

The instrumented ferries are stations. Here is the most recent available data from the TSG device aboard the Tsawwassen to Duke Point ferry:

In [33]:
get_onc_data('scalardata', 'getByStation', TOKEN, station='TWDP', deviceCategory='TSG')
Out[33]:
{'sensorData': [{'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-07-11T16:41:10.218Z',
     'value': 2.95693}],
   'sensor': 'Conductivity',
   'sensorName': 'Conductivity',
   'unitOfMeasure': 'S/m'},
  {'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-07-11T16:41:10.218Z',
     'value': 22.6041}],
   'sensor': 'salinity',
   'sensorName': 'Practical Salinity',
   'unitOfMeasure': 'psu'},
  {'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-07-11T16:41:10.218Z',
     'value': 15.9418}],
   'sensor': 'temperature',
   'sensorName': 'Temperature',
   'unitOfMeasure': 'C'}],
 'serviceMetadata': {'dateFrom': None,
  'dateTo': None,
  'deviceCategory': 'TSG',
  'nextDateFrom': None,
  'outputFormat': None,
  'rowLimit': None,
  'sensors': None,
  'station': 'TWDP',
  'totalActualSamples': 3}}

Note that the data may lag the present time by several hours because it is only transmitted from the ferry to the ONC servers when the ferry is at dock. Also, there appears to be a several hours long gap in the data each day, presumably while the ferry is docked overnight.

The ferry's location is available from the NAV device. Note that the times from the TSG and NAV devices do not appear to be synchronized.

In [34]:
get_onc_data('scalardata', 'getByStation', TOKEN, station='TWDP', deviceCategory='NAV')
Out[34]:
{'sensorData': [{'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-07-11T17:19:52.257Z',
     'value': 49.0047}],
   'sensor': 'Latitude',
   'sensorName': 'Latitude',
   'unitOfMeasure': 'deg'},
  {'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-07-11T17:19:52.257Z',
     'value': -123.1335}],
   'sensor': 'Longitude',
   'sensorName': 'Longitude',
   'unitOfMeasure': 'deg'},
  {'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-07-11T17:19:52.257Z',
     'value': 224.0}],
   'sensor': 'ship_course',
   'sensorName': 'Ship Course',
   'unitOfMeasure': 'deg'},
  {'actualSamples': 1,
   'data': [{'qaqcFlag': 1,
     'sampleTime': '2016-07-11T17:19:52.257Z',
     'value': 4.115552}],
   'sensor': 'speed_over_ground',
   'sensorName': 'Speed over Ground',
   'unitOfMeasure': 'm/s'}],
 'serviceMetadata': {'dateFrom': None,
  'dateTo': None,
  'deviceCategory': 'NAV',
  'nextDateFrom': None,
  'outputFormat': None,
  'rowLimit': None,
  'sensors': None,
  'station': 'TWDP',
  'totalActualSamples': 4}}

Parsing scalardata

The Python dict data structure that we get from the scalarData end-point has 2 top-level keys: sensorData and serviceMetadata.

In [35]:
data = get_onc_data(
    'scalardata', 'getByStation', TOKEN, station='SCVIP', deviceCategory='CTD',
    sensors='salinity,temperature',
    dateFrom=onc_datetime('2016-06-21 10:58:45'), dateTo=onc_datetime('2016-06-21 10:58:50'),
)
In [36]:
data['serviceMetadata']
Out[36]:
{'dateFrom': '2016-06-21T17:58:45.000Z',
 'dateTo': '2016-06-21T17:58:50.000Z',
 'deviceCategory': 'CTD',
 'nextDateFrom': '2016-06-21T17:58:50.444Z',
 'outputFormat': None,
 'rowLimit': None,
 'sensors': 'salinity,temperature',
 'station': 'SCVIP',
 'totalActualSamples': 10}

serviceMetadata is a dict of metadata attributes of the returned data as a whole.

In [37]:
data['sensorData']
Out[37]:
[{'actualSamples': 5,
  'data': [{'qaqcFlag': 1,
    'sampleTime': '2016-06-21T17:58:45.400Z',
    'value': 31.1297},
   {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:46.440Z', 'value': 31.1294},
   {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:47.400Z', 'value': 31.1285},
   {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:48.444Z', 'value': 31.1285},
   {'qaqcFlag': 1,
    'sampleTime': '2016-06-21T17:58:49.400Z',
    'value': 31.1282}],
  'sensor': 'salinity',
  'sensorName': 'Practical Salinity',
  'unitOfMeasure': 'psu'},
 {'actualSamples': 5,
  'data': [{'qaqcFlag': 1,
    'sampleTime': '2016-06-21T17:58:45.400Z',
    'value': 9.3081},
   {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:46.440Z', 'value': 9.3083},
   {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:47.400Z', 'value': 9.3087},
   {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:48.444Z', 'value': 9.3088},
   {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:49.400Z', 'value': 9.309}],
  'sensor': 'temperature',
  'sensorName': 'Temperature',
  'unitOfMeasure': 'C'}]

sensorData is a list of dicts containing the sensor data and metadata for each of the sensors requested in the query (or all of the sensors in the deviceCategory if an explicit list of sensors was not included in the query).

The metadata keys in each list element are:

  • sensor: the sensor id (as listed in the query)
  • actualSamples: the count of the data samples for the sensor
  • sensorName: the sensor's descriptive name
  • unitOfMeasure: the sensor's unit of measure

The sensor data is contained in a list of dicts that are the value associated with the data key:

In [38]:
data['sensorData'][0]['data']
Out[38]:
[{'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:45.400Z', 'value': 31.1297},
 {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:46.440Z', 'value': 31.1294},
 {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:47.400Z', 'value': 31.1285},
 {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:48.444Z', 'value': 31.1285},
 {'qaqcFlag': 1, 'sampleTime': '2016-06-21T17:58:49.400Z', 'value': 31.1282}]

We can parse those dicts into lists of data items with list comprehensions:

In [39]:
qaqcFlag = [d['qaqcFlag'] for d in data['sensorData'][0]['data']]
qaqcFlag
Out[39]:
[1, 1, 1, 1, 1]

The meaning of the qaqcFlag is described in the ONC docs.

In [40]:
salinity = [d['value'] for d in data['sensorData'][0]['data']]
salinity
Out[40]:
[31.1297, 31.1294, 31.1285, 31.1285, 31.1282]

It's convenient to use arrow to convert the sampleTime strings to Python objects:

In [41]:
timestamp = [arrow.get(d['sampleTime']) for d in data['sensorData'][0]['data']]
timestamp
Out[41]:
[<Arrow [2016-06-21T17:58:45.400000+00:00]>,
 <Arrow [2016-06-21T17:58:46.440000+00:00]>,
 <Arrow [2016-06-21T17:58:47.400000+00:00]>,
 <Arrow [2016-06-21T17:58:48.444000+00:00]>,
 <Arrow [2016-06-21T17:58:49.400000+00:00]>]

and from there it is easy to get timezone-aware datetime objects if we need them:

In [42]:
[t.datetime for t in timestamp]
Out[42]:
[datetime.datetime(2016, 6, 21, 17, 58, 45, 400000, tzinfo=tzutc()),
 datetime.datetime(2016, 6, 21, 17, 58, 46, 440000, tzinfo=tzutc()),
 datetime.datetime(2016, 6, 21, 17, 58, 47, 400000, tzinfo=tzutc()),
 datetime.datetime(2016, 6, 21, 17, 58, 48, 444000, tzinfo=tzutc()),
 datetime.datetime(2016, 6, 21, 17, 58, 49, 400000, tzinfo=tzutc())]

Rather than dealing with the layers of dicts and lists that we get back from the scalardata service it is worthwhile to create a function that constructs an xarray.Dataset object containing the data and metadata.

In [68]:
def onc_json_to_dataset(onc_json):
    data_vars = {}
    for sensor in data['sensorData']:
        data_vars[sensor['sensor']] = xr.DataArray(
            name=sensor['sensor'],
            data=[d['value'] for d in sensor['data']],
            coords={
                'sampleTime': [arrow.get(d['sampleTime']).datetime 
                               for d in sensor['data']],
            },
            attrs={
                'qaqcFlag': np.array([d['qaqcFlag'] for d in sensor['data']]),
                'sensorName': sensor['sensorName'],
                'unitOfMeasure': sensor['unitOfMeasure'],
                'actualSamples': sensor['actualSamples'],
            }
        )
    return xr.Dataset(data_vars, attrs=onc_json['serviceMetadata'])

The onc_json_to_dataset() function has been added to the salishssea_tools.data_tools module.

In [76]:
onc_json_to_dataset(data)
Out[76]:
<xarray.Dataset>
Dimensions:      (sampleTime: 5)
Coordinates:
  * sampleTime   (sampleTime) datetime64[ns] 2016-06-21T17:58:45.400000 ...
Data variables:
    salinity     (sampleTime) float64 31.13 31.13 31.13 31.13 31.13
    temperature  (sampleTime) float64 9.308 9.308 9.309 9.309 9.309
Attributes:
    deviceCategory: CTD
    station: SCVIP
    rowLimit: None
    dateTo: 2016-06-21T17:58:50.000Z
    outputFormat: None
    dateFrom: 2016-06-21T17:58:45.000Z
    nextDateFrom: 2016-06-21T17:58:50.444Z
    sensors: salinity,temperature
    totalActualSamples: 10

It is noteworthy that, even though the xarray.Dataset contructor collapses the sampleTime coordinates of the 2 sensors on to a single dataset coordinate, the individual sampleTime arrays are preserved at the variable level:

In [73]:
ds = onc_json_to_dataset(data)
ds.salinity.sampleTime is ds.temperature.sampleTime
Out[73]:
False
In [ ]: