# Analysis of Gridded Ensemble Precipitation and Temperature Estimates over the Contiguous United States¶

For this example, we'll work with a 100 member ensemble of precipitation and temperature data. The analysis we do below is quite simple but the problem is a good illustration of a common task in the atmospheric sciences.

In [ ]:
%matplotlib inline

import numpy as np
import xarray as xr
import matplotlib.pyplot as plt


### Connect to Dask Distributed Cluster¶

In [ ]:
from dask.distributed import Client, progress

gateway = Gateway()
cluster = gateway.new_cluster()
cluster.scale(40)
cluster

In [ ]:
client = Client(cluster)
client


### Open Dataset¶

Here we load the GMET dataset using an Intake catalog. This catalog specifies the location of the GMET data stored in the Zarr format in GCS.

The dataset has dimensions of time, latitude, longitude, and ensmemble member.

In [ ]:
## Load with intake catalog
import intake
cat = intake.Catalog('https://raw.githubusercontent.com/pangeo-data/pangeo-datastore/master/intake-catalogs/master.yaml')

In [ ]:
# Print dataset
display(ds)


### Figure: Elevation and domain mask¶

A quick plot of the mask to give us an idea of our spatial domain

In [ ]:
elevation = ds['elevation'].persist()
elevation = elevation.where(elevation > 0)
elevation.plot(figsize=(10, 6))
plt.title('Domain Elevation')


### Intra-ensemble range¶

We calculate the intra-ensemble range for all the mean daily temperature in this dataset. This gives us a sense of uncertainty.

In [ ]:
temp_mean = ds['t_mean'].mean(dim='time')
- temp_mean.min(dim='ensemble'))


### Calling compute¶

The expressions above didn't actually compute anything. They just build the task graph. To do the computations, we call the compute or persist methods:

In [ ]:
spread = spread.persist()


#### Figure: Intra-ensemble range¶

In [ ]:
spread

In [ ]:
spread.plot(robust=True, figsize=(10, 6))
plt.title('Intra-ensemble range in mean annual temperature')


### Average seasonal snowfall¶

We can compute a crude estimate of average seasonal snowfall using the temperature and precipitation variables in our dataset. Here, we'll look at the first 4 ensemble members and make some maps of the seasonal total snowfall in each ensemble member.

In [ ]:
da_snow = ds['pcp'].where(ds['t_mean'] < 0.).resample(time='QS-Mar').sum('time')
seasonal_snow = da_snow.isel(ensemble=slice(0, 4)).groupby('time.season').mean('time').persist()
progress(seasonal_snow)

In [ ]:
# properly sort the seasons
seasonal_snow = seasonal_snow.sel(season=['DJF', 'MAM','JJA', 'SON'])
seasonal_snow.attrs['units'] = 'mm/season'
seasonal_snow


#### Figure: Average seasonal snowfall totals¶

In [ ]:
seasonal_snow.plot.pcolormesh(col='season', row='ensemble', cmap='Blues', robust=True)


### Extract a time series of annual maximum precipitation events over a region¶

In the previous two examples, we've mostly reduced the time and/or ensemble dimension. Here, we'll do a reduction operation on the spatial dimension to look at a time series of extreme precipitation events near Austin, TX (30.2672° N, 97.7431° W).

In [ ]:
buf = 0.25  # look at Austin +/- 0.25 deg

ds_tx = ds.sel(lon=slice(-97.7431-buf, -97.7431+buf), lat=slice(30.2672-buf, 30.2672+buf))

In [ ]:
pcp_ann_max = ds_tx['pcp'].resample(time='AS').max('time')

In [ ]:
pcp_ann_max_ts = pcp_ann_max.max(('lat', 'lon')).persist()
progress(pcp_ann_max_ts)


#### Figure: Timeseries of maximum precipitation near Austin, Tx.¶

In [ ]:
ax = pcp_ann_max_ts.transpose().to_pandas().plot(title='Maximum precipitation near Austin, Tx', legend=False)

In [ ]: