Here we are demonstrating a new functionality within Intake, which can parse and make use of information stored in the filenames of a given dataset. This notebook demonstrates the functionality from the point of view of the end-user/data-scientist: you get the information you want, based on a spec in a catalog file (more on how to create these is in this example). No more writing messy loops and parsing code to extract the information yourself. Link to the blog post.
In this notebook we'll be using landsat 8 data hosted on Google Cloud Storage to compute the Normalized Difference Vegetation Index (NDVI) - a measurement of land-cover. Landsat data consists of multiple georeferenced images of the an area at a given time. These images are stored as separate files with each file representing a different frequency. For this NDVI calculation we'll need Band 4 (Red) and Band 5 (Near Infrared).
import intake
First we'll open the catalog and take a look at the data sources defined within:
cat = intake.open_catalog('catalog.yml')
list(cat)
For this example we will be using google_landsat_8
. We can learn more about the data source by reading the description
.
google_landsat_8 = cat.google_landsat_8()
print(google_landsat_8.description)
We will use the to_dask
method to load the files' metadata into an xarray
object:
ds = google_landsat_8.to_dask()
ds
The values for band
are parsed from the filenames and added to the data. Although we haven't yet loaded the data, the values for the band
coordinates are already known.
ds.coords
Now we will set up the NDVI computation. Since we have the band identifiers, we can select bands based on these values. This means that even if all 12 bands were loaded, this computation would still be valid.
NDVI = (ds.sel(band=5) - ds.sel(band=4)) / (ds.sel(band=5) + ds.sel(band=4))
NDVI
NDVI is not yet computed, and the data have not yet been fetched from the remote source. Before doing the computation, we will select a region of interest (ROI). Since the images are chunked, by only using a subset of the data, we will only be downloading a subset of the data. The .compute()
method is called to tell dask
to load the data into memory.
ROI = NDVI.sel(x=slice(4.7e5, 5.1e5), y=slice(4.44e6, 4.41e6)).compute()
Now that we have computed our NDVI for the region of interest, we can use hvplot
to visualize the NDVI.
import hvplot.xarray
ndvi_plot = ROI.hvplot('x', 'y', cmap='viridis', datashade=True, width=600, height=500, flip_yaxis=True)
ndvi_plot
NOTE: This plot is dynamically computed on zoom, so re-render on zoom will only work in a live python session (a notebook with a running kernel)
We can save the plot as html:
import holoviews as hv
hv.renderer('bokeh').save(ndvi_plot, 'ndvi_plot')