Hurricane Ike Maximum Water Levels

Compute the maximum water level during Hurricane Ike on a 9 million node triangular mesh storm surge model. Plot the results with Datashader.

In [ ]:
%matplotlib inline

from dask.distributed import Client, progress
from dask_gateway import Gateway
import xarray as xr
import numpy as np

Start a dask cluster to crunch the data

In [ ]:
gateway = Gateway()
cluster = gateway.new_cluster()
cluster.scale(25)
cluster
In [ ]:
client = Client(cluster)

Read the data using the cloud-friendly zarr data format

In [ ]:
pangeo = 'PANGEO-GCS-S3'
In [ ]:
if pangeo=='ESIP-AWS-S3':
    import s3fs
    fs = s3fs.S3FileSystem(anon=True)
    map = s3fs.S3Map('rsignell/adcirc_test01', s3=fs)
In [ ]:
if pangeo=='PANGEO-GCS-S3':
    import gcsfs
    fs = gcsfs.GCSFileSystem(project='pangeo-181919')
    map = gcsfs.mapping.GCSMap('pangeo-data/rsignell/adcirc_test01', gcs=fs, check=False, create=False)
In [ ]:
ds = xr.open_zarr(map)
In [ ]:
ds
In [ ]:
ds['zeta']

Note that unfortunately Xarray does not yet understand that x and y are coordinate variables, becuase we have a triangular mesh. Hopefully Xarray will gain some capabilities in this area in the future.

How many GB of sea surface height data do we have?

In [ ]:
ds['zeta'].nbytes/1.e9

We want to calculate the maximum water level at every cell over all time steps

In [ ]:
ds['zeta'].max(dim='time')

Dask will read data only when necessary, so nothing has been loaded yet. In the next cell, however, we use persist to persist the data to the Dask worker nodes, and watch the progress

In [ ]:
max_var = ds['zeta'].max(dim='time').persist()
progress(max_var)

Visualize data on mesh using Datashader

In [ ]:
import numpy as np
import datashader as dshade
import holoviews as hv
import geoviews as gv
import cartopy.crs as ccrs

from holoviews.operation.datashader import datashade, rasterize

datashade.precompute = True

hv.extension('bokeh')
%opts Image RGB VectorField [width=800 height=600]
In [ ]:
import warnings
warnings.filterwarnings("ignore")
In [ ]:
import pandas as pd
In [ ]:
v = np.vstack((ds['x'], ds['y'], max_var)).T
verts = pd.DataFrame(v, columns=['x','y','vmax'])
In [ ]:
points = gv.operation.project_points(gv.Points(verts, vdims=['vmax']))
In [ ]:
tris = pd.DataFrame(ds['element'].values.astype('int')-1, columns=['v0','v1','v2'])
In [ ]:
tiles = gv.WMTS('https://maps.wikimedia.org/osm-intl/{Z}/{X}/{Y}@2x.png')
value = 'max water level'
label = '{} (m)'.format(value)
trimesh = gv.TriMesh((tris, points), label=label)
In [ ]:
%%opts Image [colorbar=True] (cmap='rainbow')

meshes = rasterize(trimesh,aggregator=dshade.mean('vmax'))
tiles * meshes

Extract a time series at a specified lon, lat location

Because Xarray does not yet understand that x and y are coordinate variables on this triangular mesh, we create our own simple function to find the closest point. If we had a lot of these, we could use a more fancy tree algorithm.

In [ ]:
# find the indices of the points in (x,y) closest to the points in (xi,yi)
def nearxy(x,y,xi,yi):
    ind = np.ones(len(xi),dtype=int)
    for i in range(len(xi)):
        dist = np.sqrt((x-xi[i])**2+(y-yi[i])**2)
        ind[i] = dist.argmin()
    return ind
In [ ]:
#just offshore of Galveston
lat = 29.2329856
lon = -95.1535041
In [ ]:
ind = nearxy(ds['x'].values,ds['y'].values,[lon], [lat])
In [ ]:
%%time
ds['zeta'][:,ind].plot()
In [ ]: