This notebook demonstrates the Siphon remote_open
function, which opens a TDS Catalog remote dataset for random access. The remote_open
method returns a file-like object that can be used similarly to a local file to read raw data.
Before beginning, let's import the packages to be used throughout this training:
import matplotlib.pyplot as plt
import numpy as np
from siphon.catalog import TDSCatalog
Before we use remote_open
, we need to find a dataset that we'd like to access.
As an example, we'll use this dataset from the NOAA NCEI THREDDS catalog.
To access a dataset, we need to know two things:
The dataset name can be found on the dataset HTML page, e.g. "nam_218_20210104_0600_006.grb2".
The catalog URL is the URL of the dataset page up to ".html", replacing ".html" with ".xml".
catUrl="https://www.ncei.noaa.gov/thredds/catalog/model-namanl/202101/20210104/catalog.xml"
datasetName="nam_218_20210104_0600_006.grb2"
Next, we access the catalog using the catalog URL:
catalog = TDSCatalog(catUrl)
And then select our dataset using the dataset name:
ds = catalog.datasets[datasetName]
ds.name
We can now view the access protocols available for our dataset.
list(ds.access_urls)
The list of services available for this dataset includes HTTPServer
, which we'll need to open the dataset using remote_open
.
remote_open
¶We'll now use Siphon's remote_open
to obtain a file-like object representing the dataset.
data_file = ds.remote_open()
data_file
We now have an object that we can read similar to a local file.
data = data_file.readline()
data
Note: When we use remote_open
to read a dataset, we are reading raw data from a file-like object, rather than formatted data. The b
at the start of the data indicates that the string should be interpreted as bytes.
We can now read our dataset using random access.
We can read a line, as we did in the previous section, or we can read a specified number of bytes.
data = data_file.read(100)
data
We can change the our position in the file using seek
, similar to moving a cursor in a file. The position is given as bytes.
data_file.seek(0) # move "cursor" to start of file
print(data_file.read(4)) # print first 4 bytes
data_file.seek(50) # move "cursor" to byte 50
print(data_file.read(10)) # print 10 more bytes
And we can read the data directly into a byte array.
b = bytearray(100) # create a byte array of length 100
data_file.readinto(b) # read 100 bytes into the byte array
b[:]
Calling getbuffer
returns the location in memory where the dataset is being stored locally.
b = data_file.getbuffer()
b
We can use the memory buffer to make local writes. Write to the buffer will change the contents of data_file
in memory, but will not write to the remote file.
data_file.seek(100) # move "cursor" position to byte 100
b[100:110] = b"helloworld"; # we include the `b` before "helloword" to tell Python to interpret it as bytes
data_file.seek(100) # return "cursor" to byte 100
n = data_file.read(10) # read back the written bytes
n
We have opened a remote dataset and read parts of it using random access! Use remote_open
when you want access to the raw data in a dataset, e.g., if you have Python code to read bytes in a particular format.
Note: Without some prior knowledge about the format of the dataset, remote_open
is not an effective method of parsing data. Since we are reading a raw file object, we need to know layout of the data and the data types (e.g. ints, floats, etc.). To read a dataset as a netCDF object, use remote_access
For more information on Siphon and remote_open
, see the Siphon docs.
You may also be interested in reading more about the file-like object returned by remote_open
.