Often times a dataset's name will contain useful information, like the datasets associated date and time.
Siphon includes methods to help filter THREDDS Catalog datasets based on this type of naming convention, which makes finding the most recent dataset somewhat easier.
We will use the built-in python library datetime
to set the desired dates/times used by the filter method.
from siphon.catalog import TDSCatalog
from datetime import datetime, timedelta
Building off of the skills developed in the [THREDDS Catalogs: The Basics](https://nbviewer.jupyter.org/github/Unidata/pyaos-ams-2021/blob/master/notebooks/dataAccess/siphon-catalog-basics.ipynb) notebook, we'll start by reading the catalog associated with the [Unidata NEXRAD Level3 n0r 1km composite THREDDS Catalog](https://thredds.unidata.ucar.edu/thredds/catalog/nexrad/composite/gini/n0r/1km/catalog.html) and looking at what is available:
n0r_composite_catalog = TDSCatalog('https://thredds.unidata.ucar.edu/thredds/catalog/nexrad/composite/gini/n0r/1km/catalog.xml')
print(n0r_composite_catalog.catalog_url)
print(' catalogs: {}'.format(n0r_composite_catalog.catalog_refs))
print(' datasets: {}\n'.format(n0r_composite_catalog.datasets))
Once again we see this THREDDS Catalog points to other catalogs named by date (yyyyMMdd
).
Let's say we'd like to choose the catalog for today's date and list the datasets.
However, "today's date" will change depending on when you run this notebook.
No problem - datetime
to the rescue!
First, we can ask datetime
to give us the current datetime object:
current_date_time = datetime.now()
print(' year: {}'.format (current_date_time.year))
print(' month: {}'.format (current_date_time.month))
print(' day: {}'.format (current_date_time.day))
print(' hour: {}'.format (current_date_time.hour))
print(' minute: {}'.format (current_date_time.minute))
print(' second: {}'.format (current_date_time.second))
print(' microseconds: {}'.format(current_date_time.microsecond))
Since we would like the catalog associated with the current <year><month><day>
, we can use the datetime object to make the appropriate string for us to use:
catalog_name_for_today = current_date_time.strftime('%Y%m%d')
print('Read the catalog named {}'.format(catalog_name_for_today))
catalog_for_today = n0r_composite_catalog.catalog_refs[catalog_name_for_today].follow()
The strftime
method takes a format string (in this case, '%Y%m%d'
).
This tells the datetime
library how we want the date/time string to look when converted into a string.
%Y
means use a four digit year, %m
means use a two digit month, and %d
means use a two digit day.
A full description of all of the possible formats can be found in the python datetime
documentation.
Now that we have the catalog containing the data for today, let's examine the name of the first few datasets to get an idea of how they are named:
datasets_for_today = catalog_for_today.datasets
datasets_for_today[0:3]
The dataset names should look like Level3_Composite_n0r_1km_20210109_2355.gini
.
As we can see, the date and time is encoded within the filename.
When the filename contains a date and time in the form of <year><month><day>_<hour><minute>
, Siphon can automatically filter for times using the filter_time_nearest
or filter_time_range
methods.
These functions accept datetime objects to describe the time (or times) you desire.
For example, to find the dataset closest to the current date and time, we can use filter_time_nearest
with datetime.now()
, like so:
current_time = datetime.now()
most_recent_dataset = datasets_for_today.filter_time_nearest(current_time)
print('Current date and time: {}'.format(current_time))
print('Most Recent Dataset: {}'.format(most_recent_dataset))
one_hour_ago = current_time - timedelta(hours=1)
datasets_from_last_hour = datasets_for_today.filter_time_range(one_hour_ago, current_time)
print(' Start time: {}'.format(one_hour_ago))
print(' End time: {}'.format(current_time))
print(' Datasets between start and end times: {}'.format(datasets_from_last_hour))
These two filter methods can make finding datasets within a catalog a bit easier for simple cases where the date and time are part of the dataset names, like the ones described in this notebook.