CFSR data was obtained from https://rda.ucar.edu/datasets/ds093.1 as GRIB2 files, converted to NetCDF4 using wgrib2, and then written to HSDS using h5pyd. The HSDS data is accessed here using xarray, which uses h5netcdf, which in turn uses h5pyd as a dropin replacement for h5py, thus reading an HSDS dataset instead of a HDF5 file. This renders HSDS totally transparent to the user, as they can just open a URL with xarray and begin working.
%matplotlib inline
import xarray as xr
import h5pyd
import warnings
warnings.simplefilter('ignore')
To access data from HSDS, you need a username and password from the provider: this HSDS endpoint is on XSEDE, and controlled by jreadey@hdfgroup.org
.
Without credentials (stored in your ~/.hscfg
file) the following attempt to get server info will fail.
h5pyd.getServerInfo()
{'about': 'HSDS is a webservice for HSDS data', 'endpoint': 'http://149.165.156.12:5101', 'greeting': 'Welcome to HSDS!', 'hsds_version': '0.1', 'name': 'HSDS XSEDE Jetstream', 'password': '***************', 'start_time': 1515637934, 'state': 'READY', 'username': 'rsignell'}
url_hsds = 'http://149.165.156.12:5101/home/rsignell/tmp2m_2017_rechunked.nc'
ds_hsds = xr.open_dataset(url_hsds, engine='h5netcdf', decode_cf=True)
ds_hsds
<xarray.Dataset> Dimensions: (latitude: 880, longitude: 1760, time: 8016) Coordinates: * latitude (latitude) float64 -89.84 -89.64 -89.44 -89.23 -89.03 ... * longitude (longitude) float64 0.0 0.2045 0.4091 0.6136 0.8182 ... * time (time) datetime64[ns] 2017-01-01T01:00:00 ... Data variables: TMP_2maboveground (time, latitude, longitude) float64 ... Attributes: Conventions: COARDS GRIB2_grid_template: 40 History: Fri Dec 22 14:18:00 2017: ncrcat tmp2m.cdas1.2... NCO: 4.7.0 nco_openmp_thread_number: 1
ds_hsds['TMP_2maboveground']
<xarray.DataArray 'TMP_2maboveground' (time: 8016, latitude: 880, longitude: 1760)> [12415180800 values with dtype=float64] Coordinates: * latitude (latitude) float64 -89.84 -89.64 -89.44 -89.23 -89.03 -88.82 ... * longitude (longitude) float64 0.0 0.2045 0.4091 0.6136 0.8182 1.023 ... * time (time) datetime64[ns] 2017-01-01T01:00:00 2017-01-01T02:00:00 ... Attributes: least_significant_digit: 2 level: 2 m above ground long_name: Temperature short_name: TMP_2maboveground units: K
%time ds_hsds['TMP_2maboveground'].sel(time='2017-01-15 04:00').plot()
CPU times: user 252 ms, sys: 78 ms, total: 330 ms Wall time: 546 ms
<matplotlib.collections.QuadMesh at 0x7fe16a2acdd8>
start = '2017-01-01'
stop = '2017-03-01'
(lon0, lat0) = (-71.5+360, 41.5)
loc_hsds = ds_hsds.sel(longitude=lon0, latitude=lat0, method='nearest')
d_hsds = loc_hsds.sel(time=slice(start,stop))
%time d_hsds['TMP_2maboveground'].plot(figsize=(12,2));
CPU times: user 59 ms, sys: 2 ms, total: 61 ms Wall time: 683 ms
[<matplotlib.lines.Line2D at 0x7fe16a1a9668>]
f = h5pyd.File(url_hsds,'r')
ds = f['TMP_2maboveground']
print(ds.chunks)
[4, 220, 440]