THREDDS Catalogs: Filtering Datasets

Unidata AMS 2021 Student Conference

Focuses¶

Filtering THREDDS Catalog datasets based on time

Objectives¶

Find the Dataset within a THREDDS Catalog nearest to a given time
Find the Datasets within a THREDDS Catalog inside a given time range

Imports¶

Often times a dataset's name will contain useful information, like the datasets associated date and time. Siphon includes methods to help filter THREDDS Catalog datasets based on this type of naming convention, which makes finding the most recent dataset somewhat easier. We will use the built-in python library datetime to set the desired dates/times used by the filter method.

In [ ]:

from siphon.catalog import TDSCatalog
from datetime import datetime, timedelta

Find the Dataset within a THREDDS Catalog closest to a given time ¶

Building off of the skills developed in the [THREDDS Catalogs: The Basics](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/dataAccess/siphon-catalog-basics.ipynb) notebook, we'll start by reading the catalog associated with the [Unidata NEXRAD Level3 n0r 1km composite THREDDS Catalog](https://thredds.unidata.ucar.edu/thredds/catalog/nexrad/composite/gini/n0r/1km/catalog.html) and looking at what is available:

In [ ]:

n0r_composite_catalog = TDSCatalog('https://thredds.unidata.ucar.edu/thredds/catalog/nexrad/composite/gini/n0r/1km/catalog.xml')
print(n0r_composite_catalog.catalog_url)
print('  catalogs: {}'.format(n0r_composite_catalog.catalog_refs))
print('  datasets: {}\n'.format(n0r_composite_catalog.datasets))

Once again we see this THREDDS Catalog points to other catalogs named by date (yyyyMMdd). Let's say we'd like to choose the catalog for today's date and list the datasets. However, "today's date" will change depending on when you run this notebook. No problem - datetime to the rescue! First, we can ask datetime to give us the current datetime object:

In [ ]:

current_date_time = datetime.now()
print('  year: {}'.format (current_date_time.year))
print('  month: {}'.format (current_date_time.month))
print('  day: {}'.format (current_date_time.day))
print('  hour: {}'.format (current_date_time.hour))
print('  minute: {}'.format (current_date_time.minute))
print('  second: {}'.format (current_date_time.second))
print('  microseconds: {}'.format(current_date_time.microsecond))

Since we would like the catalog associated with the current <year><month><day>, we can use the datetime object to make the appropriate string for us to use:

In [ ]:

catalog_name_for_today = current_date_time.strftime('%Y%m%d')
print('Read the catalog named {}'.format(catalog_name_for_today))
catalog_for_today = n0r_composite_catalog.catalog_refs[catalog_name_for_today].follow()

The strftime method takes a format string (in this case, '%Y%m%d'). This tells the datetime library how we want the date/time string to look when converted into a string. %Y means use a four digit year, %m means use a two digit month, and %d means use a two digit day. A full description of all of the possible formats can be found in the python datetime documentation.

Now that we have the catalog containing the data for today, let's examine the name of the first few datasets to get an idea of how they are named:

In [ ]:

datasets_for_today = catalog_for_today.datasets
datasets_for_today[0:3]

The dataset names should look like Level3_Composite_n0r_1km_20210109_2355.gini. As we can see, the date and time is encoded within the filename. When the filename contains a date and time in the form of <year><month><day>_<hour><minute>, Siphon can automatically filter for times using the filter_time_nearest or filter_time_range methods. These functions accept datetime objects to describe the time (or times) you desire. For example, to find the dataset closest to the current date and time, we can use filter_time_nearest with datetime.now(), like so:

In [ ]:

current_time = datetime.now()
most_recent_dataset = datasets_for_today.filter_time_nearest(current_time)
print('Current date and time: {}'.format(current_time))
print('Most Recent Dataset: {}'.format(most_recent_dataset))

Top

Find the Datasets within a THREDDS Catalog inside a given time range ¶

We can use the filter_time_range method to find the datasets associated with the past hour by supplying start and end times:

In [ ]:

one_hour_ago = current_time - timedelta(hours=1)
datasets_from_last_hour = datasets_for_today.filter_time_range(one_hour_ago, current_time)
print('  Start time: {}'.format(one_hour_ago))
print('  End time: {}'.format(current_time))
print('  Datasets between start and end times: {}'.format(datasets_from_last_hour))

These two filter methods can make finding datasets within a catalog a bit easier for simple cases where the date and time are part of the dataset names, like the ones described in this notebook.

Top

THREDDS Catalogs: Filtering Datasets

Unidata AMS 2021 Student Conference

Focuses¶

Objectives¶

Imports¶

Find the Dataset within a THREDDS Catalog closest to a given time ¶

Find the Datasets within a THREDDS Catalog inside a given time range ¶

See also¶

Related Notebooks¶