This notebook will demonstrate how to use Siphon to subset and download data using the NetcdfSubset service (NCSS). NCSS supports coordinate-based subsetting, i.e. selecting data by latitude, longitude, time, etc.
Before beginning, let's import the packages to be used throughout this training:
import matplotlib.pyplot as plt
import numpy as np
from siphon.catalog import TDSCatalog
from datetime import datetime, timedelta
Our first step is to find a dataset that we'd like to access and subset.
In this example, we'll use the latest GFS Quarter Degree Forecaset
dataset from the Unidata THREDDS catalog.
Let's start with the top level catalog:
top_cat = TDSCatalog('http://thredds.ucar.edu/thredds/catalog.xml')
And then navigate down two levels to the GFS catalog:
models_cat = top_cat.catalog_refs[0].follow() # follow reaturns a handle to the specified dataset
gfs_cat = models_cat.catalog_refs['GFS Quarter Degree Forecast'].follow()
Finally, we get a handle for our dataset using latest
:
ds = gfs_cat.latest
ds.name
We can now view the access protocols available for our dataset.
list(ds.access_urls)
This list includes the NetcdfSubset
service (or NCSS), which is the service we'll be using to subset and download our data.
To use the NetcdfSubset service, we first call subset
to get an NCSS client.
ncss = ds.subset()
With this client, we can view the variables in our dataset.
list(ncss.variables)
We can also access the metata, which will be returned as NCSSDataset object.
metadata = ncss.metadata
# print metadata
print("time span: " + str(metadata.time_span))
print("\naccept list: " + str(metadata.accept_list))
print("\nlat_lon_box: " + str(metadata.lat_lon_box))
We will use this metadata to create our subset query in the next section.
We can now use our NCSS client to create a query for the data we want.
In this example, we'll request a subset of data containing the next 24 hours of forecast at a single point.
First, we create a query object.
query = ncss.query()
Next, we populate the query to request the data we want.
query.lonlat_point(lon=-105, lat=40) # set coordinates of point of interest.
now = datetime.utcnow() # get current time
query.time_range(now, now + timedelta(days=1)) # create time range of 24 hours
query.variables('Temperature_surface') # request surface temperature variable
query.accept('netcdf4') # return data as a netCDF4 object
Once our query is fully populated, we can request the data.
point_data = ncss.get_data(query)
list(point_data.variables)
Finally, let's plot our returned data.
temp = point_data.variables['Temperature_surface'][:] # get surface temperature data
time = point_data.variables['time'][:] # get time data
plt.plot(time, temp, 'k-'); # plot data
We can also request data for a region using a bounding box.
We start by creating a query object, just as before.
query = ncss.query()
We will populate this query with the same values as before, except instead of latlon_point
we'll use latlon_box
.
query.lonlat_box(east=-80, west=-90, south=35, north=45) # set bounding coordinates
query.time(now + timedelta(days=1))
query.variables('Temperature_surface')
query.accept('netcdf4')
Again, we request the data using get_data
.
grid_data = ncss.get_data(query)
list(grid_data.variables)
And plot the surface temperature forecast in our region of interest over the next 24 hours.
temp = grid_data.variables['Temperature_surface']
lat = grid_data.variables['lat']
lon = grid_data.variables['lon']
plt.pcolormesh(lon[:], lat[:], temp[0], shading='auto');
plt.title(temp.name);
Try creating your own NCSS query to request different subsets of data, e.g. different regions, different times...