Notebook

Water Observations from Space (WOFS)¶

Background¶

Understanding Australia's flood history is an important part of making better predictions about how we will be affected by flooding in the future.

To this end, Geoscience Australia developved the Australian Water Observations from Space (WOFS) algorithm. WOFS provides an estimate of how often water was seen at a particular location. This water detection algorithm is significantly better than the Landsat QA water flag or the NDWI index for water identification.

For more information, visit this website: http://www.ga.gov.au/scientific-topics/hazards/flood/wofs

Preliminary steps¶

In [1]:

# Supress Warning 
import warnings
warnings.filterwarnings('ignore')

In [2]:

# Load Data Cube Configuration
import datacube
dc = datacube.Datacube(app = 'my_app')

# Import Data Cube API
import utils.data_cube_utilities.data_access_api as dc_api  
api = dc_api.DataAccessApi()

# Import other required packages
import matplotlib.pyplot as plt
import numpy as np  
import xarray as xr  

Define product and extent¶

Available extents¶

We've listed the available ingested data that you can explore in the ODC Sandbox. The latitude, longitude and time ranges correspond to the boundaries of the ingested data cubes. You'll be able to explore sub-samples of these cubes. You'll also need to provide the platform, product and resolution information for the cube you're subsampling.

LS8 Caqueta¶

Platform: 'LANDSAT_8'
Product: 'ls8_collection1_AMA_ingest'
Latitude: (0.000134747292617865, 1.077843593651382)
Longitude: (-74.91935994831539, -73.30266193148462)
Time: ('2013-04-13', '2018-03-26')
Resolution: (-0.000269494585236, 0.000269494585236)

LS8 Vietnam¶

Platform: 'LANDSAT_8'
Product: 'ls8_collection1_AMA_ingest'
Latitude: (10.513927001104687, 12.611133863411238)
Longitude: (106.79005909290998, 108.91906631627438)
Time: ('2014-01-14', '2016-12-21')
Resolution: (-0.000269494585236, 0.000269494585236)

LS7 Caqueta¶

Platform: 'LANDSAT_7'
Product: 'ls7_collection1_AMA_ingest'
Latitude: (0.000134747292617865, 1.077843593651382)
Longitude: (-74.91935994831539, -73.30266193148462)
Time: ('1999-08-21', '2018-03-25')
Resolution: (-0.000269494585236, 0.000269494585236)

LS7 Lake Baringo¶

Platform: 'LANDSAT_7'
Product: 'ls7_collection1_AMA_ingest'
Latitude: (0.4997747685, 0.7495947795)
Longitude: (35.9742163305, 36.473586859499996)
Time: ('2005-01-08', '2016-12-24')
Resolution: (-0.000269493, 0.000269493)

In [3]:

# CHANGE HERE >>>>>>>>>>>>>>>>>

# Select a product and platform
platform = "LANDSAT_8"
product = 'ls8_collection1_AMA_ingest'
resolution = (-0.000269494585236, 0.000269494585236)
output_crs = 'EPSG:4326'

Set extent information¶

You can change the values in this cell to specify the extent of the data cube you wish to analyse.

You should select a sub-sample from one of the four data cubes listed above. When subsampling, keep in mind that:

Your latitude and longitude bounds should be within the extents given.
Your area should be small to keep load times reasonable (less than 0.5 square degrees).
Your time period should be within the extents given.

You should format the variables as:

latitude = (min_latitude, max_latitude)
longitude = (min_longitude, max_longitude)
time_extents = (min_time, max_time), where each time has the format: 'YYYY-MM-DD'.

In [4]:

# CHANGE HERE >>>>>>>>>>>>>>>>>>

# Select a sub-region to analyse
latitude = (1.0684, 0.8684)
longitude  = (-74.8409, -74.6409)
time_extents = ('2000-01-01', '2018-01-01')

View the region before loading¶

The next cell will allow you to view the area you'll be analysing by displaying a red bounding box on an interactive map. You can change the extents in the previous cell and rerun the display_map() command to see the resulting bounding box.

In [5]:

# The code below renders a map that can be used to view the analysis region.
from utils.data_cube_utilities.dc_display_map import display_map

display_map(latitude, longitude)

Out[5]:

Load the dataset and the required spectral bands or other parameters¶

The data is loaded by passing the product and area information to the dc.load() function. As a part of this load, we also specify the measurements we want in the form of the Landsat bands.

The load can take up to a few minutes, so please be patient.

In [6]:

# Load the data
landsat_dataset = dc.load(
    latitude=latitude,
    longitude=longitude,
    platform=platform,
    time=time_extents,
    product=product,
    output_crs=output_crs,
    resolution=resolution,
    measurements=(
        'red',
        'green',
        'blue',
        'nir',
        'swir1',
        'swir2',
        'pixel_qa'
    )
) 

It is often useful to print the loaded data to check the dimensions and data variables

When looking at the dimensions, the numbers for latitude and longitude correspond to the number of pixels in each dimension and the number for time corresponds to the number of time steps.

In [7]:

# Displays an overview of the loaded data
print(landsat_dataset)

<xarray.Dataset>
Dimensions:    (latitude: 743, longitude: 743, time: 48)
Coordinates:
  * time       (time) datetime64[ns] 2013-05-06T15:15:22 2013-06-07T15:15:34 ...
  * latitude   (latitude) float64 1.068 1.068 1.068 1.068 1.067 1.067 1.067 ...
  * longitude  (longitude) float64 -74.84 -74.84 -74.84 -74.84 -74.84 -74.84 ...
Data variables:
    red        (time, latitude, longitude) int16 2457 2277 2212 2320 2412 ...
    green      (time, latitude, longitude) int16 2549 2388 2337 2432 2535 ...
    blue       (time, latitude, longitude) int16 2484 2298 2211 2320 2379 ...
    nir        (time, latitude, longitude) int16 4441 4291 4221 4304 4371 ...
    swir1      (time, latitude, longitude) int16 3193 3052 3053 3080 3173 ...
    swir2      (time, latitude, longitude) int16 2335 2211 2199 2230 2315 ...
    pixel_qa   (time, latitude, longitude) int32 480 480 480 480 480 480 480 ...
Attributes:
    crs:      EPSG:4326

Masking out clouds¶

As part of the utilities for the Open Data Cube, we have defined a function to mask clouds based on the quality assurance information for Landsat. The function returns an xarray.DataArray object containing the mask. This can then be passed to the where() function, which masks the data.

In [8]:

from utils.data_cube_utilities.clean_mask import landsat_qa_clean_mask

cloud_mask = landsat_qa_clean_mask(landsat_dataset, platform=platform)

cleaned_dataset = landsat_dataset.where(cloud_mask)

Time Series Water Detection Analysis¶

Time series output of the Australian Water Detection from Space (WOFS) results. The results show the percent of time that a pixel is classified as water over the entire time series. BLUE = frequent water, RED = infrequent water.

The first step is to classify the dataset, which can be done with the wofs_classify() utility function.

In [9]:

from utils.data_cube_utilities.dc_water_classifier import wofs_classify

ts_water_classification = wofs_classify(landsat_dataset, clean_mask=cloud_mask)

The next step is to convert "no data" pixels to nan. A "no data" pixel has a value of -9999 in Landsat data.

In [10]:

ts_water_classification = ts_water_classification.where(ts_water_classification != -9999).astype(np.float16)

Finally, the percentage of time that a pixel is classified as water is calculated by taking the average classification value over time and multiplying it by 100. The mean calculation ignores nan values.

In [11]:

water_classification_percentages = (ts_water_classification.mean(dim=['time']) * 100).wofs.rename('water_classification_percentages')

Exploring the results¶

After calculating the water classification percentage, we can plot it both as a 2-dimensional image and 1-dimensional summary.

The first step is to choose a colour map and change the colour of nan pixels to black. We choose to use the RdBu colour map to highlight water in blue and land in red.

In [12]:

# import color-scheme and set nans to black
from matplotlib.cm import RdBu
RdBu.set_bad('black', 1)

In the following figure, dark blue indicates pixels that experienced significant or constant water over the time series, where dark red indicates pixels that have experienced little or no water over the time series.

You can adjust the figure size to avoid distortion. Use the latitude and longitude dimensions from the xarray description to get an idea for the desired aspect ratio. You'll need to add some space in the x-dimension to account for the presence of the colour bar.

In [13]:

# CHANGE HERE >>>>>>>>>>>>>>>>>

water_classification_percentages.plot(cmap=RdBu, figsize=(14, 12))
plt.show()

By taking the average classification value over the latitude and longitude, we can assess whether the fraction of water pixels has changed significantly over time. It should be noted that clouds can impact the statistical results. The water classification percentage can be displayed on either a linear scale or a logarithmic scale.

In [14]:

water_classification_mean_percentages = (ts_water_classification.mean(dim=['latitude', 'longitude']) * 100).wofs.rename('water_classification_percentages')

#Linear-scale plot
water_classification_mean_percentages.plot(figsize=(15,3), marker='o', linestyle='None')
plt.title("Percentage of water pixels over time (linear)")
plt.show()

#Logarithmic-scale plot
water_classification_mean_percentages.plot(figsize=(15,3), marker='o', linestyle='None')
plt.title("Percentage of water pixels over time (logarithmic)")
plt.gca().set_yscale('log')

Export to GeoTIFF¶

To perform further analysis, use the following cells to download the data in GeoTIFF format. This makes use of the data cube utility function export_slice_to_geotiff().

Before exporting, we'll construct an xarray dataset to store the water classification percentage data we created earlier.

In [15]:

# Save the water percentage data to a GeoTIFF
from utils.data_cube_utilities.import_export import export_slice_to_geotiff

# construct the xarray Dataset
dataset_to_export = xr.Dataset(coords=water_classification_percentages.coords, attrs=ts_water_classification.attrs)

# add the water classification percentages to the new xarray Dataset
dataset_to_export['wofs_pct'] = (water_classification_percentages/100).astype(np.float32)

The export command on the following line is commented out to avoid overwriting files. If you would like to export data, please change the filename before uncommenting the next line.

In [16]:

# CHANGE HERE >>>>>>>>>>>>>>>>>>>>>>>>>

# export_slice_to_geotiff(dataset_to_export, 'geotiffs/WOFS_Percentage_demo.tif')

By default, the files have been saved in the geotiffs folder, which sits inside the dcal folder that this notebook is stored in. Use the following cell to list the contents of the geotiffs folder.

NOTE: Starting a command with ! allows you to run that command in the Jupyter environment's command line.

In [17]:

!ls -lah geotiffs/

total 12K
drwxrwsr-x 2 jovyan users 4.0K Apr  9 01:57 .
drwxrwsr-x 5 jovyan users 4.0K May 14 02:28 ..
-rw-rw-r-- 1 jovyan users   38 Mar 31 23:41 README.md