The main python package we will use to work with THREDDS Catalogs is called Siphon. Siphon can read THREDDS Catalogs, which are xml documents that do one or more of the following:
The xml documents themselves can be written by hand, but often they are generated by a server, such as the THREDDS Data Server. They may be read locally from an xml file, or remotely over HTTP. Siphon greatly simplifies the process of reading and using the information contained in xml, allowing users to "siphon off data" from a variety of sources.
from siphon.catalog import TDSCatalog
For this notebook, we will use the Unidata demonstration TDS.
If you visit the server <https://thredds.ucar.edu/thredds/catalog/catalog.html> in your browser, you will see something like the image at the top of this notebook.
The page you see is actually a product of the TDS, and is generated by the server (we call this an HTML view of the catalog).
If you change the last part of the URL from .html
to .xml
(that is, <https://thredds.ucar.edu/thredds/catalog/catalog.html>), you will see the actual THREDDS Catalog in your browser, which looks similar to this:
<catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" xmlns:xlink="http://www.w3.org/1999/xlink" name="Unidata THREDDS Data Server" version="1.0.1">
<dataset name="Realtime data from IDD">
<catalogRef xlink:href="idd/forecastModels.xml" xlink:title="Forecast Model Data" name=""/>
<catalogRef xlink:href="idd/forecastProdsAndAna.xml" xlink:title="Forecast Products and Analyses" name=""/>
<catalogRef xlink:href="idd/obsData.xml" xlink:title="Observation Data" name=""/>
<catalogRef xlink:href="idd/radars.xml" xlink:title="Radar Data" name=""/>
<catalogRef xlink:href="idd/satellite.xml" xlink:title="Satellite Data" name=""/>
</dataset>
<dataset name="Other Unidata Data">
<catalogRef xlink:href="casestudies/catalog.xml" xlink:title="Unidata case studies" name=""/>
</dataset>
</catalog>
This catalog tells us that there are other catalogs containing data from forecast models, radar data, satellite data, etc. In this sense, a catalog can point to other catalogs, creating a tree-like structure in which the datasets are organized. This will vary from server to server, as needs vary across organizations and groups.
We can use Siphon to read in this remote catalog programmatically, without the need for a web browser:
catalog = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/catalog.xml')
Now that we have read in the THREDDS Catalog from the THREDDS Data Server, we can investigate what information it holds.
A list of the names of other catalogs it points to is contained within the catalog_refs
instance attribute, and can be access as follows:
catalog.catalog_refs
There are more things you can do once you have read a THREDDS Catalog using Siphon, but for now we'll leave it at this.
If we'd like to see what is available in the 'Satellite Data'
catalog, we can use the .follow()
method to read in the new catalog, and look at the .catalog_refs
instance attribute of the new catalog:
satellite_catalog = catalog.catalog_refs['Satellite Data'].follow()
satellite_catalog.catalog_refs
The URL of the new catalog is in the catalog_url
instance attribute, and can be accessed as follows:
satellite_catalog.catalog_url
Any datasets described by the catalog are contained in the datasets
instance attribute:
satellite_catalog.datasets
The []
indicates there are no datasets contained within the catalog.
We can continue to work our way down through the catalog structure until we reach a catalog that contains a dataset.
goes_east_grb_catalog = satellite_catalog.catalog_refs['GOES East GOES Rebroadcast (GRB)'].follow()
print(goes_east_grb_catalog.catalog_url)
print(' catalogs: {}'.format(goes_east_grb_catalog.catalog_refs))
print(' datasets: {}\n'.format(goes_east_grb_catalog.datasets))
abi_catalog = goes_east_grb_catalog.catalog_refs['ABI'].follow()
print(abi_catalog.catalog_url)
print(' catalogs: {}'.format(abi_catalog.catalog_refs))
print(' datasets: {}\n'.format(abi_catalog.datasets))
conus_catalog = abi_catalog.catalog_refs['CONUS'].follow()
print(conus_catalog.catalog_url)
print(' catalogs: {}'.format(conus_catalog.catalog_refs))
print(' datasets: {}\n'.format(conus_catalog.datasets))
channel01_catalog = conus_catalog.catalog_refs['Channel01'].follow()
print(channel01_catalog.catalog_url)
print(' catalogs: {}'.format(channel01_catalog.catalog_refs))
print(' datasets: {}\n'.format(channel01_catalog.datasets))
date_catalog = channel01_catalog.catalog_refs['20210110'].follow()
print(date_catalog.catalog_url)
print(' catalogs: {}'.format(date_catalog.catalog_refs))
print(' datasets: {}\n'.format(date_catalog.datasets))
We used the follow()
method several times before finally reaching a catalog with datasets.
Normally, it is easiest to browse the catalogs of a TDS using a web browser in order to find a dataset collection that you might be interested in using.
Once you have found a dataset you are interested in, you can use the URL from your browser to begin working in python using Siphon.
For this collection of data (CONUS domain of the GOES East satellite Advanced Baseline Imager instrument (channel 1)), the catalog https://thredds.ucar.edu/thredds/catalog/satellite/goes/east/grb/ABI/CONUS/Channel01/catalog.xml looks like a good place to start, as it points to catalogs named by date (yyyyMMdd
).
As mentioned at the beginning of this notebook, catalogs can expose metadata about a dataset.
The metadata
instance variable holds any metadata defined by the catalog, such as dataFormat
, documentation
, etc.
For example, the metadata associated with date_catalog
looks like:
date_catalog.metadata
The amount of metadata contained within a catalog depends on how much effort has been put into currating the collection.
dataset = date_catalog.datasets['OR_ABI-L1b-RadC-M6C01_G16_s20210100156163_e20210100158536_c20210100158591.nc']
Now that we have a dataset, we can see in what ways we can access the dataset using the access_urls
instance variable:
dataset.access_urls
Each service provides a unique way of accessing the metadata or actual data contained within the dataset. Other Siphon notebooks explore ways in which the services can be used, but at this point, you are ready to begin your data analysis journey!
](https://thredds.jpl.nasa.gov/thredds/catalog/catalog.html)