Siphon (remote_open)

Unidata AMS 2021 Student Conference

This notebook demonstrates the Siphon remote_open function, which opens a TDS Catalog remote dataset for random access. The remote_open method returns a file-like object that can be used similarly to a local file to read raw data.

Focuses¶

Open remote datasets on the TDS
Use the returned object to read the dataset as raw bytes
Interface with the dataset as if stored in a local file

Objectives¶

Find a dataset in a TDS Catalog
Open the dataset using remote_open
Read the returned object like a local file

Imports¶

Before beginning, let's import the packages to be used throughout this training:

In [ ]:

import matplotlib.pyplot as plt
import numpy as np
from siphon.catalog import TDSCatalog

1. Find a dataset in a TDS Catalog¶

Before we use remote_open, we need to find a dataset that we'd like to access.
As an example, we'll use this dataset from the NOAA NCEI THREDDS catalog.

To access a dataset, we need to know two things:

the url of the catalog where the dataset lives
the dataset name

The dataset name can be found on the dataset HTML page, e.g. "nam_218_20210104_0600_006.grb2".
The catalog URL is the URL of the dataset page up to ".html", replacing ".html" with ".xml".

In [ ]:

catUrl="https://www.ncei.noaa.gov/thredds/catalog/model-namanl/202101/20210104/catalog.xml"
datasetName="nam_218_20210104_0600_006.grb2"

Next, we access the catalog using the catalog URL:

In [ ]:

catalog = TDSCatalog(catUrl)

And then select our dataset using the dataset name:

In [ ]:

ds = catalog.datasets[datasetName]
ds.name

We can now view the access protocols available for our dataset.

In [ ]:

list(ds.access_urls)

The list of services available for this dataset includes HTTPServer, which we'll need to open the dataset using remote_open.

Top

2. Open the dataset using `remote_open`¶

We'll now use Siphon's remote_open to obtain a file-like object representing the dataset.

In [ ]:

data_file = ds.remote_open()
data_file

We now have an object that we can read similar to a local file.

In [ ]:

data = data_file.readline()
data

Note: When we use remote_open to read a dataset, we are reading raw data from a file-like object, rather than formatted data. The b at the start of the data indicates that the string should be interpreted as bytes.

Top

3. Read the returned object like a local file¶

We can now read our dataset using random access.

We can read a line, as we did in the previous section, or we can read a specified number of bytes.

In [ ]:

data = data_file.read(100)
data

We can change the our position in the file using seek, similar to moving a cursor in a file. The position is given as bytes.

In [ ]:

data_file.seek(0) # move "cursor" to start of file
print(data_file.read(4)) # print first 4 bytes
data_file.seek(50) # move "cursor" to byte 50
print(data_file.read(10)) # print 10 more bytes

And we can read the data directly into a byte array.

In [ ]:

b = bytearray(100) # create a byte array of length 100
data_file.readinto(b) # read 100 bytes into the byte array
b[:]

Calling getbuffer returns the location in memory where the dataset is being stored locally.

In [ ]:

b = data_file.getbuffer()
b

We can use the memory buffer to make local writes. Write to the buffer will change the contents of data_file in memory, but will not write to the remote file.

In [ ]:

data_file.seek(100) # move "cursor" position to byte 100
b[100:110] = b"helloworld"; # we include the `b` before "helloword" to tell Python to interpret it as bytes
data_file.seek(100) # return "cursor" to byte 100
n = data_file.read(10) # read back the written bytes
n

We have opened a remote dataset and read parts of it using random access! Use remote_open when you want access to the raw data in a dataset, e.g., if you have Python code to read bytes in a particular format.

Note: Without some prior knowledge about the format of the dataset, remote_open is not an effective method of parsing data. Since we are reading a raw file object, we need to know layout of the data and the data types (e.g. ints, floats, etc.). To read a dataset as a netCDF object, use remote_access

Top

Siphon (remote_open)

Unidata AMS 2021 Student Conference

Focuses¶

Objectives¶

Imports¶

1. Find a dataset in a TDS Catalog¶

2. Open the dataset using remote_open¶

3. Read the returned object like a local file¶

See also¶

Related notebooks¶

2. Open the dataset using `remote_open`¶