Pattern as Path - landsat

Here we are demonstrating a new functionality within Intake, which can parse and make use of information stored in the filenames of a given dataset. This notebook demonstrates the functionality from the point of view of the end-user/data-scientist: you get the information you want, based on a spec in a catalog file (more on how to create these is in this example). No more writing messy loops and parsing code to extract the information yourself. Link to the blog post.

In this notebook we'll be using landsat 8 data hosted on Google Cloud Storage to compute the Normalized Difference Vegetation Index (NDVI) - a measurement of land-cover. Landsat data consists of multiple georeferenced images of the an area at a given time. These images are stored as separate files with each file representing a different frequency. For this NDVI calculation we'll need Band 4 (Red) and Band 5 (Near Infrared).

In [ ]:
import intake

First we'll open the catalog and take a look at the data sources defined within:

In [ ]:
cat = intake.Catalog('catalog.yml')
list(cat)

For this example we will be using google_landsat_8. We can learn more about the data source by reading the description.

In [ ]:
google_landsat_8 = cat.google_landsat_8()
print(google_landsat_8.description)

We will use the to_dask method to load the files' metadata into an xarray object:

In [ ]:
ds = google_landsat_8.to_dask()
ds

The values for band are parsed from the filenames and added to the data. Although we haven't yet loaded the data, the values for the band coordinates are already known.

In [ ]:
ds.coords

Computation

Now we will set up the NDVI computation. Since we have the band identifiers, we can select bands based on these values. This means that even if all 12 bands were loaded, this computation would still be valid.

In [ ]:
NDVI = (ds.sel(band=5) - ds.sel(band=4)) / (ds.sel(band=5) + ds.sel(band=4))
NDVI

NDVI is not yet computed, and the data have not yet been fetched from the remote source. Before doing the computation, we will select a region of interest (ROI). Since the images are chunked, by only using a subset of the data, we will only be downloading a subset of the data. The .compute() method is called to tell dask to load the data into memory.

In [ ]:
ROI = NDVI.sel(x=slice(4.7e5, 5.1e5), y=slice(4.44e6, 4.41e6)).compute()

Visualization

Now that we have computed our NDVI for the region of interest, we can use hvplot to visualize the NDVI.

In [ ]:
import hvplot.xarray
In [ ]:
ndvi_plot = ROI.hvplot('x', 'y', cmap='viridis', datashade=True, width=600, height=500)
ndvi_plot

NOTE: This plot is dynamically computed on zoom, so re-render on zoom will only work in a live python session (a notebook with a running kernel)

We can save the plot as html:

In [ ]:
import holoviews as hv

hv.renderer('bokeh').save(ndvi_plot, 'ndvi_plot')