Written by Jean-Luc Stevens
Created: October 11, 2018
Last updated: August 6, 2021
While the loss of the Aral Sea in Kazakhstan and Lake Urmia in Iran have received a lot of attention over the last few decades, this trend is a global phenomena. Reciently a number of papers have been published including one focusing on the Decline of the world's saline lakes. Many of these lakes have lost the majority of their volume over the last century, including Walker Lake (Nevada, USA) which has lost 90 percent of its volume over the last 100 years.
The following example is intended to replicate the typical processing required in change detection studies similar to the Decline of the world's saline lakes.
import warnings
import intake
import numpy as np
import holoviews as hv
from holoviews import opts
import geoviews as gv
import cartopy.crs as ccrs
from dask.array import PerformanceWarning
from colorcet import coolwarm
from holoviews.operation.datashader import rasterize
warnings.simplefilter('ignore', PerformanceWarning)
hv.extension('bokeh', width=80)
In this example, we would like to use Dask to demonstrate how image processing can be distributed across workers, either running locally or across a cluster. In the next cell, we instantiate a Dask distributed Client where we request eight, single-threaded workers and declare a memory limit of 8GB per worker. You can experiment with different memory limits (e.g 4GB) and different numbers of workers but note that each worker should only use one thread as Datashader manages its own parallelization using Numba:
# arbitrarily choose a memory limit (8GB) to demonstrate the out of core
# processing infrastructure
from dask.distributed import Client
client = Client(memory_limit=8*1e9, n_workers=8, threads_per_worker=1)
# As Datashader uses parallel Numba for raster rendering, we need to use
# single threaded Dask workers on each CPU to avoid contention.
client
To replicate this study, we first have to obtain the data from primary sources. The conventional way to obtain Landsat image data is to download it through USGS's EarthExplorer or NASA's Giovanni, but to facilitate the example two images have been downloaded from EarthExployer and cached.
The two images used by the original study are LT05_L1TP_042033_19881022_20161001_01_T1 and LC08_L1TP_042033_20171022_20171107_01_T1 from 1988/10/22 and 2017/10/22 respectively. These images contain Landsat Surface Reflectance Level-2 Science Product images.
intake
¶In the next cell, we load the Landsat-5 files into a single xarray DataArray
using intake. Data sources and caching parameters are specified in a catalog file. Intake is optional, since any other method of creating an xarray.DataArray
object would work here as well, but it makes it simpler to work with remote datasets while caching them locally.
cat = intake.open_catalog('./catalog.yml')
landsat_5 = cat.landsat_5()
landsat_5_img = landsat_5.read_chunked()
landsat_5_img
landsat_8 = cat.landsat_8()
landsat_8_img = landsat_8.read_chunked()
landsat_8_img
Now let us view some metadata about this DataArray
:
print("The shape of the DataArray is :", landsat_5_img.shape)
print("With attributes:\n ", '\n '.join('%s=%s'%(k,v) for k,v in landsat_5_img.attrs.items()))
We can use this EPSG value shown above under the crs
key to create a cartopy coordinate reference system that we will be using later on in this notebook:
crs=ccrs.UTM(11)
landsat_5_img.data[landsat_5_img.data==-9999] = np.NaN # Replace the -9999
ndvi5_array = (landsat_5_img[4]-landsat_5_img[3])/(landsat_5_img[4]+landsat_5_img[3])
ndvi5 = ndvi5_array.to_dataset(name='ndvi')[['x','y', 'ndvi']]
client.persist(ndvi5)
Now we can do this for the Landsat 8 files for the 2017 image:
landsat_8_img.data[landsat_8_img.data==-9999] = np.NaN # Replace the -9999
ndvi8_array = (landsat_8_img[4]-landsat_8_img[3])/(landsat_8_img[4]+landsat_8_img[3])
ndvi8 = ndvi8_array.to_dataset(name='ndvi')[['x','y', 'ndvi']]
client.persist(ndvi8)
import datashader as ds
from datashader import reductions as rd
x_range = (int(max(ndvi5.x.min(), ndvi8.x.min())), int(min(ndvi5.x.max(), ndvi8.x.max())))
y_range = (int(max(ndvi5.y.min(), ndvi8.y.min())), int(min(ndvi5.y.max(), ndvi8.y.max())))
plot_width = min(ndvi8.dims['x'], ndvi5.dims['x'])
plot_height = min(ndvi8.dims['y'], ndvi5.dims['y'])
cvs = ds.Canvas(plot_width=plot_width, plot_height=plot_height,
x_range = x_range, y_range = y_range)
ndvi5 = cvs.raster(ndvi5, agg= rd.mean('ndvi'))
ndvi8 = cvs.raster(ndvi8, agg= rd.mean('ndvi'))
Using datashader together with geoviews, we can now easily build an interactive visualization where we select between the 1988 and 2017 images. The use of datashader allows these images to be dynamically updated according to zoom level (Note: it can take datashader a minute to 'warm up' before it becomes fully interactive). For more information on how the dropdown widget was created using HoloMap
, please refer to the HoloMap reference.
opts.defaults(
opts.Curve(width=600, tools=['hover']),
opts.Image(cmap='viridis', width=450, height=450, tools=['hover'], colorbar=True))
hmap = hv.HoloMap({'1988':gv.Image(ndvi5, crs=crs, vdims=['ndvi'], rtol=10),
'2017':gv.Image(ndvi8, crs=crs, vdims=['ndvi'], rtol=10)},
# Error on travis
kdims=['Year']).redim(x='lon', y='lat') # Mapping 'x' and 'y' from rasterio to 'lon' and 'lat'
rasterize(hmap)
The rest of the notebook shows how statistical operations can reduce the dimensionality of the data that may be used to compute new features that may be used as part of an ML pipeline.
The next plot (may take a minute to compute) shows the mean of the two NDVI images next to the sum of them:
mean_avg = hmap.collapse(dimensions=['Year'], function=np.mean)
mean_img = gv.Image(mean_avg.data, crs=crs, kdims=['lon', 'lat'],
vdims=['ndvi']).relabel('Mean over Year')
summed = hmap.collapse(dimensions=['Year'], function=np.sum)
summed_image = gv.Image(summed.data, crs=crs, kdims=['lon', 'lat'],
vdims=['ndvi']).relabel('Sum over Year')
If you are getting a bunch of warnings, then it is possible that your data are on a different grid. We can check whether summed.data
is all null. If it is, then we'll need to regrid.
if summed.data.ndvi.isnull().all():
res = 3500 if (plot_width < 100) and (plot_height < 100) else 100
xsamples = (max(ndvi5.x.max(), ndvi8.x.max()) - min(ndvi5.x.min(), ndvi8.x.min())) / res
ysamples = (max(ndvi5.y.max(), ndvi8.y.max()) - min(ndvi5.y.min(), ndvi8.y.min())) / res
cvs = ds.Canvas(plot_width=int(xsamples), plot_height=int(ysamples), x_range = x_range, y_range = y_range)
resampled_ndvi5 = cvs.raster(ndvi5, agg= rd.mean('ndvi'))
resampled_ndvi8 = cvs.raster(ndvi8, agg= rd.mean('ndvi'))
hmap = hv.HoloMap({'1988':gv.Image(resampled_ndvi5, crs=crs, vdims=['ndvi']),
'2017':gv.Image(resampled_ndvi8, crs=crs, vdims=['ndvi'])},
kdims=['Year']).redim(x='lon', y='lat') # Mapping 'x' and 'y' from rasterio to 'lon' and 'lat'
mean_avg = hmap.collapse(dimensions=['Year'], function=np.mean)
mean_img = gv.Image(mean_avg.data, crs=crs, kdims=['lon', 'lat'],
vdims=['ndvi']).relabel('Mean over Year')
summed = hmap.collapse(dimensions=['Year'], function=np.sum)
summed_image = gv.Image(summed.data, crs=crs, kdims=['lon', 'lat'],
vdims=['ndvi']).relabel('Sum over Year')
rasterize(mean_img) + rasterize(summed_image)
The change in Walker Lake as viewed using the NDVI can be shown by subtracting the NDVI recorded in 1988 from the NDVI recorded in 2017:
diff = np.subtract(hmap['1988'].data, hmap['2017'].data)
difference = gv.Image(diff, crs=crs, kdims=['lon', 'lat'], vdims=['ndvi'])
difference = difference.relabel('Difference in NDVI').redim(ndvi='delta_ndvi')
rasterize(difference).redim.range(delta_ndvi=(-1.0,1.0)).opts(cmap=coolwarm)
You can see a large change (positive delta) in the areas where there is water, indicating a reduction in the size of the lake over this time period.
lon
and lat
¶As a final example, we can use the sample
method to slice across the difference in NDVI along (roughly) the midpoint of the latitude and the midpoint of the longitude. To do this, we define the following helper function to convert latitude/longitude into the appropriate coordinate value used by the DataSet
:
def from_lon_lat(x,y):
return crs.transform_point(x,y, ccrs.PlateCarree())
lon_y, lat_x = from_lon_lat(-118, 39) # Longitude of -118 and Latitude of 39
(difference.sample(lat=lat_x) + difference.sample(lon=lon_y)).cols(1)