For this example, we'll work with a 100 member ensemble of precipitation and temperature data. The analysis we do below is quite simple but the problem is a good illustration of a common task in the atmospheric sciences.
Link to dataset: https://www.earthsystemgrid.org/dataset/gridded_precip_and_temp.html
%matplotlib inline import numpy as np import xarray as xr import matplotlib.pyplot as plt
from dask.distributed import Client, progress from dask_gateway import Gateway gateway = Gateway() cluster = gateway.new_cluster() cluster.scale(40) cluster
client = Client(cluster) client
## Load with intake catalog import intake cat = intake.Catalog('https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/master.yaml') ds = cat.atmosphere.gmet_v1.read_chunked()
# Print dataset display(ds)
elevation = ds['elevation'].persist() elevation = elevation.where(elevation > 0) elevation.plot(figsize=(10, 6)) plt.title('Domain Elevation')
We calculate the intra-ensemble range for all the mean daily temperature in this dataset. This gives us a sense of uncertainty.
temp_mean = ds['t_mean'].mean(dim='time') spread = (temp_mean.max(dim='ensemble') - temp_mean.min(dim='ensemble')) spread
The expressions above didn't actually compute anything. They just build the task graph. To do the computations, we call the
spread = spread.persist() progress(spread)
spread spread.attrs['units'] = 'degC'
spread.plot(robust=True, figsize=(10, 6)) plt.title('Intra-ensemble range in mean annual temperature')
We can compute a crude estimate of average seasonal snowfall using the temperature and precipitation variables in our dataset. Here, we'll look at the first 4 ensemble members and make some maps of the seasonal total snowfall in each ensemble member.
da_snow = ds['pcp'].where(ds['t_mean'] < 0.).resample(time='QS-Mar').sum('time') seasonal_snow = da_snow.isel(ensemble=slice(0, 4)).groupby('time.season').mean('time').persist() progress(seasonal_snow)
# properly sort the seasons seasonal_snow = seasonal_snow.sel(season=['DJF', 'MAM','JJA', 'SON']) seasonal_snow.attrs['units'] = 'mm/season' seasonal_snow
seasonal_snow.plot.pcolormesh(col='season', row='ensemble', cmap='Blues', robust=True)
In the previous two examples, we've mostly reduced the time and/or ensemble dimension. Here, we'll do a reduction operation on the spatial dimension to look at a time series of extreme precipitation events near Austin, TX (30.2672° N, 97.7431° W).
buf = 0.25 # look at Austin +/- 0.25 deg ds_tx = ds.sel(lon=slice(-97.7431-buf, -97.7431+buf), lat=slice(30.2672-buf, 30.2672+buf))
pcp_ann_max = ds_tx['pcp'].resample(time='AS').max('time')
pcp_ann_max_ts = pcp_ann_max.max(('lat', 'lon')).persist() progress(pcp_ann_max_ts)
ax = pcp_ann_max_ts.transpose().to_pandas().plot(title='Maximum precipitation near Austin, Tx', legend=False)