Analysis of Gridded Ensemble Precipitation and Temperature Estimates over the Contiguous United States

For this example, we'll work with a 100 member ensemble of precipitation and temperature data. The analysis we do below is quite simple but the problem is a good illustration of a common task in the atmospheric sciences.

Link to dataset: https://www.earthsystemgrid.org/dataset/gridded_precip_and_temp.html

In [ ]:
%matplotlib inline

import numpy as np
import xarray as xr
import matplotlib.pyplot as plt

Connect to Dask Distributed Cluster

In [ ]:
from dask.distributed import Client, progress
from dask_gateway import Gateway

gateway = Gateway()
cluster = gateway.new_cluster()
cluster.scale(40)
cluster
In [ ]:
client = Client(cluster)
client

Open Dataset

Here we load the GMET dataset using an Intake catalog. This catalog specifies the location of the GMET data stored in the Zarr format in GCS.

The dataset has dimensions of time, latitude, longitude, and ensmemble member.

In [ ]:
## Load with intake catalog
import intake
cat = intake.Catalog('https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/master.yaml')
ds = cat.atmosphere.gmet_v1.read_chunked()
In [ ]:
# Print dataset
display(ds)

Figure: Elevation and domain mask

A quick plot of the mask to give us an idea of our spatial domain

In [ ]:
elevation = ds['elevation'].persist()
elevation = elevation.where(elevation > 0)
elevation.plot(figsize=(10, 6))
plt.title('Domain Elevation')

Intra-ensemble range

We calculate the intra-ensemble range for all the mean daily temperature in this dataset. This gives us a sense of uncertainty.

In [ ]:
temp_mean = ds['t_mean'].mean(dim='time')
spread = (temp_mean.max(dim='ensemble')
        - temp_mean.min(dim='ensemble'))
spread

Calling compute

The expressions above didn't actually compute anything. They just build the task graph. To do the computations, we call the compute or persist methods:

In [ ]:
spread = spread.persist()
progress(spread)

Figure: Intra-ensemble range

In [ ]:
spread
spread.attrs['units'] = 'degC'
In [ ]:
spread.plot(robust=True, figsize=(10, 6))
plt.title('Intra-ensemble range in mean annual temperature')

Average seasonal snowfall

We can compute a crude estimate of average seasonal snowfall using the temperature and precipitation variables in our dataset. Here, we'll look at the first 4 ensemble members and make some maps of the seasonal total snowfall in each ensemble member.

In [ ]:
da_snow = ds['pcp'].where(ds['t_mean'] < 0.).resample(time='QS-Mar').sum('time')
seasonal_snow = da_snow.isel(ensemble=slice(0, 4)).groupby('time.season').mean('time').persist()
progress(seasonal_snow)
In [ ]:
# properly sort the seasons
seasonal_snow = seasonal_snow.sel(season=['DJF', 'MAM','JJA', 'SON'])
seasonal_snow.attrs['units'] = 'mm/season'
seasonal_snow

Figure: Average seasonal snowfall totals

In [ ]:
seasonal_snow.plot.pcolormesh(col='season', row='ensemble', cmap='Blues', robust=True)

Extract a time series of annual maximum precipitation events over a region

In the previous two examples, we've mostly reduced the time and/or ensemble dimension. Here, we'll do a reduction operation on the spatial dimension to look at a time series of extreme precipitation events near Austin, TX (30.2672° N, 97.7431° W).

In [ ]:
buf = 0.25  # look at Austin +/- 0.25 deg

ds_tx = ds.sel(lon=slice(-97.7431-buf, -97.7431+buf), lat=slice(30.2672-buf, 30.2672+buf))
In [ ]:
pcp_ann_max = ds_tx['pcp'].resample(time='AS').max('time')
In [ ]:
pcp_ann_max_ts = pcp_ann_max.max(('lat', 'lon')).persist()
progress(pcp_ann_max_ts)

Figure: Timeseries of maximum precipitation near Austin, Tx.

In [ ]:
ax = pcp_ann_max_ts.transpose().to_pandas().plot(title='Maximum precipitation near Austin, Tx', legend=False)
In [ ]: