Hurricane Ike Maximum Water Levels¶

Compute the maximum water level during Hurricane Ike on a 18 million element triangular mesh storm surge model. Plot the results using HoloViz TriMesh rendering with Datashader.

In [1]:

import xarray as xr
import numpy as np
import pandas as pd
import fsspec

Read the data using the cloud-friendly zarr data format¶

In [2]:

fs = fsspec.filesystem('s3', anon=False, requester_pays=True)

In [3]:

fs_osn = fsspec.filesystem('s3', anon=True, client_kwargs=dict(endpoint_url='https://ncsa.osn.xsede.org'))

In [4]:

ncfile_on_s3 = 's3://floodid-louisiana-model-data/gstofs/sample_run/fort.63.nc'

In [5]:

fs.size(ncfile_on_s3)/1e9  #GB

Out[5]:

2.781632181

In [6]:

%%time
ds = xr.open_dataset(fs.open(ncfile_on_s3), drop_variables=['nvel'], chunks={'time':1, 'node':511400})

CPU times: user 2.88 s, sys: 196 ms, total: 3.07 s
Wall time: 5.12 s

In [7]:

ds.zeta.encoding

Out[7]:

{'chunksizes': (1, 511400),
 'fletcher32': False,
 'shuffle': True,
 'zlib': True,
 'complevel': 2,
 'source': '<File-like object S3FileSystem, floodid-louisiana-model-data/gstofs/sample_run/fort.63.nc>',
 'original_shape': (40, 12784991),
 'dtype': dtype('<f8'),
 '_FillValue': -99999.0,
 'coordinates': 'time y x'}

In [8]:

ds.zeta

Out[8]:

<xarray.DataArray 'zeta' (time: 40, node: 12784991)>
dask.array<open_dataset-9d8ed5545d2239c017b6d200f9ff4ff0zeta, shape=(40, 12784991), dtype=float64, chunksize=(1, 511400), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 2016-09-01T12:00:00 2016-09-02 ... 2016-09-21
    x        (node) float64 dask.array<chunksize=(511400,), meta=np.ndarray>
    y        (node) float64 dask.array<chunksize=(511400,), meta=np.ndarray>
Dimensions without coordinates: node
Attributes:
    long_name:      water surface elevation above geoid
    standard_name:  sea_surface_height_above_geoid
    location:       node
    mesh:           adcirc_mesh
    units:          m

In [9]:

#fs = fsspec.filesystem('s3', anon=True, skip_instance_cache=True, client_kwargs={'endpoint_url': 'https://renc.osn.xsede.org'})
#ds = xr.open_dataset(fs.get_mapper('s3://rsignellbucket2/esip/adcirc/ike'), engine='zarr',
#                                    backend_kwargs=dict(consolidated=False), chunks={'time':90})

In [10]:

#fs = fsspec.filesystem('s3', anon=True, skip_instance_cache=True, client_kwargs={'endpoint_url': 'https://mghp.osn.xsede.org'})
#ds = xr.open_dataset(fs.get_mapper('s3://rsignellbucket1/esip/adcirc/ike'), engine='zarr',
#                                    backend_kwargs=dict(consolidated=False), chunks={'time':90})

In [11]:

ds['zeta']

Out[11]:

<xarray.DataArray 'zeta' (time: 40, node: 12784991)>
dask.array<open_dataset-9d8ed5545d2239c017b6d200f9ff4ff0zeta, shape=(40, 12784991), dtype=float64, chunksize=(1, 511400), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 2016-09-01T12:00:00 2016-09-02 ... 2016-09-21
    x        (node) float64 dask.array<chunksize=(511400,), meta=np.ndarray>
    y        (node) float64 dask.array<chunksize=(511400,), meta=np.ndarray>
Dimensions without coordinates: node
Attributes:
    long_name:      water surface elevation above geoid
    standard_name:  sea_surface_height_above_geoid
    location:       node
    mesh:           adcirc_mesh
    units:          m

How many GB of sea surface height data do we have?

In [12]:

ds['zeta'].nbytes/1.e9

Out[12]:

4.09119712

Use a Dask Cluster to speed up calculations¶

Use a Dask Distributed LocalCluster¶

In [13]:

#from dask.distributed import LocalCluster, Client

In [14]:

#cluster = LocalCluster()
#client = Client(cluster)
#client

In [15]:

#client.close()

We want to take the maximum over the time dimension. Let's use a Dask cluster to distribute the memory and compute load, getting our work done faster!

In [16]:

import os
import configparser

def set_credentials(profile='default', region='us-west-2', endpoint=None, cfile=None):
    # Set AWS environment variables based on profile from the user credentials file
    cp = configparser.ConfigParser()
    if not cfile:
        cfile=os.path.expanduser('~/.aws/credentials')
    if not endpoint:
        endpoint=f's3.{region}.amazonaws.com'
    cp.read(cfile)
    os.environ['aws_access_key_id'.upper()] = cp[profile]['aws_access_key_id']    
    os.environ['aws_secret_access_key'.upper()] = cp[profile]['aws_secret_access_key']    
    os.environ['aws_s3_endpoint'.upper()] = endpoint
    os.environ['aws_region'.upper()] = region

In [17]:

set_credentials()

In [18]:

os.environ.update()

In [19]:

from dask_gateway import Gateway
import os
# instantiate dask gateway
gateway = Gateway()

# specify cluster options
options = gateway.cluster_options()
options.conda_environment='users/users-pangeo'
options.profile = 'Medium Worker'   # 2 threads/worker

# set environment variables for cluster workers (including AWS credential environment variables)
options.environment_vars = dict(os.environ)

# create new cluster
cluster = gateway.new_cluster(options)

# get the client for the cluster
client = cluster.get_client()

In [20]:

%%time
n_workers=30
cluster.scale(n_workers)
#client.wait_for_workers(n_workers=n_workers)

CPU times: user 2.43 ms, sys: 0 ns, total: 2.43 ms
Wall time: 6.34 ms

In [21]:

client

Out[21]:

Client

Client-81ec0e33-56f8-11ee-828e-b6b13b65153d

Connection method: Cluster object	Cluster type: dask_gateway.GatewayCluster
Dashboard: https://nebari.esipfed.org/gateway/clusters/dev.bdfaccdf504b4547bb4a802489537821/status

Cluster Info

GatewayCluster

Name: dev.bdfaccdf504b4547bb4a802489537821
Dashboard: https://nebari.esipfed.org/gateway/clusters/dev.bdfaccdf504b4547bb4a802489537821/status

In [22]:

#client.close()

Find the maximum water level at each grid cell over the entire storm¶

This is the compute intensive step, reading all the elevation data

In [23]:

%%time
max_var = ds['zeta'].max(dim='time').compute()
max_var

CPU times: user 609 ms, sys: 416 ms, total: 1.02 s
Wall time: 2min

Out[23]:

<xarray.DataArray 'zeta' (node: 12784991)>
array([nan, nan, nan, ..., nan, nan, nan])
Coordinates:
    x        (node) float64 -73.68 -73.68 -73.68 -73.68 ... -90.07 -90.07 -90.07
    y        (node) float64 42.75 42.75 42.75 42.75 ... 30.03 30.03 30.03 30.03
Dimensions without coordinates: node

Visualize data on mesh using HoloViz.org tools¶

In [24]:

import numpy as np
import geoviews as gv
import hvplot.xarray
import holoviews.operation.datashader as dshade

In [25]:

dshade.datashade.precompute = True

In [26]:

#max_var = ds['zeta'].isel(time=20)

In [27]:

%%time
v = np.vstack((ds['x'], ds['y'], max_var)).T
verts = pd.DataFrame(v, columns=['x','y','vmax'])
points = gv.operation.project_points(gv.Points(verts, vdims=['vmax']))
tris = pd.DataFrame(ds['element'].values.astype('int')-1, columns=['v0','v1','v2'])

CPU times: user 4.47 s, sys: 1.23 s, total: 5.7 s
Wall time: 25.5 s

In [28]:

tiles = gv.tile_sources.OSM

In [29]:

trimesh = gv.TriMesh((tris, points), label='ADCIRC Global Water Level (m)')
mesh = dshade.rasterize(trimesh).opts(cmap='turbo', colorbar=True, width=650, height=500)

In [ ]:

tiles * mesh

Extract a time series at a specified lon, lat location¶

Because Xarray does not yet understand that x and y are coordinate variables on this triangular mesh, we create our own simple function to find the closest point. If we had a lot of these, we could use a more fancy tree algorithm.

In [31]:

ds['zeta']

Out[31]:

<xarray.DataArray 'zeta' (time: 40, node: 12784991)>
dask.array<open_dataset-9d8ed5545d2239c017b6d200f9ff4ff0zeta, shape=(40, 12784991), dtype=float64, chunksize=(1, 511400), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 2016-09-01T12:00:00 2016-09-02 ... 2016-09-21
    x        (node) float64 dask.array<chunksize=(511400,), meta=np.ndarray>
    y        (node) float64 dask.array<chunksize=(511400,), meta=np.ndarray>
Dimensions without coordinates: node
Attributes:
    long_name:      water surface elevation above geoid
    standard_name:  sea_surface_height_above_geoid
    location:       node
    mesh:           adcirc_mesh
    units:          m

In [32]:

# find the indices of the points in (x,y) closest to the points in (xi,yi)
def nearxy(x,y,xi,yi):
    ind = np.ones(len(xi),dtype=int)
    for i in range(len(xi)):
        dist = np.sqrt((x-xi[i])**2+(y-yi[i])**2)
        ind[i] = dist.argmin()
    return ind

In [ ]:

ds['x'].min().values

In [40]:

#just offshore of Galveston
lat = 29.0329856
lon = -95.1535041

In [41]:

ind = nearxy(ds['x'].values,ds['y'].values,[lon], [lat])

In [42]:

ds['zeta'][:,ind].hvplot(x='time', grid=True)

Out[42]:

Be a good citizen and shutdown your cluster if you are done using it

In [36]:

cluster.shutdown()

/home/conda/users/226e91a965d185c2173ace6d0daf5747b886d5f2b5299f8dbce39e2e7fb456a3-20230830-192612-585397-241-pangeo/lib/python3.10/site-packages/dask_gateway/client.py:1014: RuntimeWarning: coroutine 'rpc.close_rpc' was never awaited
  self.scheduler_comm.close_rpc()