Understanding Australia's flood history is an important part of making better predictions about how we will be affected by flooding in the future.
To this end, Geoscience Australia developved the Australian Water Observations from Space (WOFS) algorithm. WOFS provides an estimate of how often water was seen at a particular location. This water detection algorithm is significantly better than the Landsat QA water flag or the NDWI index for water identification.
For more information, visit this website: http://www.ga.gov.au/scientific-topics/hazards/flood/wofs
# Supress Warning
import warnings
warnings.filterwarnings('ignore')
# Load Data Cube Configuration
import datacube
dc = datacube.Datacube(app = 'my_app')
# Import Data Cube API
import utils.data_cube_utilities.data_access_api as dc_api
api = dc_api.DataAccessApi()
# Import other required packages
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
We've listed the available ingested data that you can explore in the ODC Sandbox. The latitude, longitude and time ranges correspond to the boundaries of the ingested data cubes. You'll be able to explore sub-samples of these cubes. You'll also need to provide the platform, product and resolution information for the cube you're subsampling.
Platform: 'LANDSAT_8'
Product: 'ls8_collection1_AMA_ingest'
Latitude: (0.000134747292617865, 1.077843593651382)
Longitude: (-74.91935994831539, -73.30266193148462)
Time: ('2013-04-13', '2018-03-26')
Resolution: (-0.000269494585236, 0.000269494585236)
Platform: 'LANDSAT_8'
Product: 'ls8_collection1_AMA_ingest'
Latitude: (10.513927001104687, 12.611133863411238)
Longitude: (106.79005909290998, 108.91906631627438)
Time: ('2014-01-14', '2016-12-21')
Resolution: (-0.000269494585236, 0.000269494585236)
Platform: 'LANDSAT_7'
Product: 'ls7_collection1_AMA_ingest'
Latitude: (0.000134747292617865, 1.077843593651382)
Longitude: (-74.91935994831539, -73.30266193148462)
Time: ('1999-08-21', '2018-03-25')
Resolution: (-0.000269494585236, 0.000269494585236)
Platform: 'LANDSAT_7'
Product: 'ls7_collection1_AMA_ingest'
Latitude: (0.4997747685, 0.7495947795)
Longitude: (35.9742163305, 36.473586859499996)
Time: ('2005-01-08', '2016-12-24')
Resolution: (-0.000269493, 0.000269493)
# CHANGE HERE >>>>>>>>>>>>>>>>>
# Select a product and platform
platform = "LANDSAT_8"
product = 'ls8_collection1_AMA_ingest'
resolution = (-0.000269494585236, 0.000269494585236)
output_crs = 'EPSG:4326'
You can change the values in this cell to specify the extent of the data cube you wish to analyse.
You should select a sub-sample from one of the four data cubes listed above. When subsampling, keep in mind that:
You should format the variables as:
latitude = (min_latitude, max_latitude)
longitude = (min_longitude, max_longitude)
time_extents = (min_time, max_time)
, where each time has the format: 'YYYY-MM-DD'
.# CHANGE HERE >>>>>>>>>>>>>>>>>>
# Select a sub-region to analyse
latitude = (1.0684, 0.8684)
longitude = (-74.8409, -74.6409)
time_extents = ('2000-01-01', '2018-01-01')
The next cell will allow you to view the area you'll be analysing by displaying a red bounding box on an interactive map. You can change the extents in the previous cell and rerun the display_map()
command to see the resulting bounding box.
# The code below renders a map that can be used to view the analysis region.
from utils.data_cube_utilities.dc_display_map import display_map
display_map(latitude, longitude)
The data is loaded by passing the product and area information to the dc.load()
function. As a part of this load, we also specify the measurements we want in the form of the Landsat bands.
The load can take up to a few minutes, so please be patient.
# Load the data
landsat_dataset = dc.load(
latitude=latitude,
longitude=longitude,
platform=platform,
time=time_extents,
product=product,
output_crs=output_crs,
resolution=resolution,
measurements=(
'red',
'green',
'blue',
'nir',
'swir1',
'swir2',
'pixel_qa'
)
)
It is often useful to print the loaded data to check the dimensions and data variables
When looking at the dimensions, the numbers for latitude and longitude correspond to the number of pixels in each dimension and the number for time corresponds to the number of time steps.
# Displays an overview of the loaded data
print(landsat_dataset)
<xarray.Dataset> Dimensions: (latitude: 743, longitude: 743, time: 48) Coordinates: * time (time) datetime64[ns] 2013-05-06T15:15:22 2013-06-07T15:15:34 ... * latitude (latitude) float64 1.068 1.068 1.068 1.068 1.067 1.067 1.067 ... * longitude (longitude) float64 -74.84 -74.84 -74.84 -74.84 -74.84 -74.84 ... Data variables: red (time, latitude, longitude) int16 2457 2277 2212 2320 2412 ... green (time, latitude, longitude) int16 2549 2388 2337 2432 2535 ... blue (time, latitude, longitude) int16 2484 2298 2211 2320 2379 ... nir (time, latitude, longitude) int16 4441 4291 4221 4304 4371 ... swir1 (time, latitude, longitude) int16 3193 3052 3053 3080 3173 ... swir2 (time, latitude, longitude) int16 2335 2211 2199 2230 2315 ... pixel_qa (time, latitude, longitude) int32 480 480 480 480 480 480 480 ... Attributes: crs: EPSG:4326
As part of the utilities for the Open Data Cube, we have defined a function to mask clouds based on the quality assurance information for Landsat. The function returns an xarray.DataArray
object containing the mask. This can then be passed to the where()
function, which masks the data.
from utils.data_cube_utilities.clean_mask import landsat_qa_clean_mask
cloud_mask = landsat_qa_clean_mask(landsat_dataset, platform=platform)
cleaned_dataset = landsat_dataset.where(cloud_mask)
Time series output of the Australian Water Detection from Space (WOFS) results. The results show the percent of time that a pixel is classified as water over the entire time series. BLUE = frequent water, RED = infrequent water.
The first step is to classify the dataset, which can be done with the wofs_classify()
utility function.
from utils.data_cube_utilities.dc_water_classifier import wofs_classify
ts_water_classification = wofs_classify(landsat_dataset, clean_mask=cloud_mask)
The next step is to convert "no data" pixels to nan
. A "no data" pixel has a value of -9999
in Landsat data.
ts_water_classification = ts_water_classification.where(ts_water_classification != -9999).astype(np.float16)
Finally, the percentage of time that a pixel is classified as water is calculated by taking the average classification value over time and multiplying it by 100. The mean calculation ignores nan
values.
water_classification_percentages = (ts_water_classification.mean(dim=['time']) * 100).wofs.rename('water_classification_percentages')
After calculating the water classification percentage, we can plot it both as a 2-dimensional image and 1-dimensional summary.
The first step is to choose a colour map and change the colour of nan
pixels to black. We choose to use the RdBu
colour map to highlight water in blue and land in red.
# import color-scheme and set nans to black
from matplotlib.cm import RdBu
RdBu.set_bad('black', 1)
In the following figure, dark blue indicates pixels that experienced significant or constant water over the time series, where dark red indicates pixels that have experienced little or no water over the time series.
You can adjust the figure size to avoid distortion. Use the latitude
and longitude
dimensions from the xarray
description to get an idea for the desired aspect ratio. You'll need to add some space in the x-dimension to account for the presence of the colour bar.
# CHANGE HERE >>>>>>>>>>>>>>>>>
water_classification_percentages.plot(cmap=RdBu, figsize=(14, 12))
plt.show()
By taking the average classification value over the latitude and longitude, we can assess whether the fraction of water pixels has changed significantly over time. It should be noted that clouds can impact the statistical results. The water classification percentage can be displayed on either a linear scale or a logarithmic scale.
water_classification_mean_percentages = (ts_water_classification.mean(dim=['latitude', 'longitude']) * 100).wofs.rename('water_classification_percentages')
#Linear-scale plot
water_classification_mean_percentages.plot(figsize=(15,3), marker='o', linestyle='None')
plt.title("Percentage of water pixels over time (linear)")
plt.show()
#Logarithmic-scale plot
water_classification_mean_percentages.plot(figsize=(15,3), marker='o', linestyle='None')
plt.title("Percentage of water pixels over time (logarithmic)")
plt.gca().set_yscale('log')
To perform further analysis, use the following cells to download the data in GeoTIFF format. This makes use of the data cube utility function export_slice_to_geotiff()
.
Before exporting, we'll construct an xarray
dataset to store the water classification percentage data we created earlier.
# Save the water percentage data to a GeoTIFF
from utils.data_cube_utilities.import_export import export_slice_to_geotiff
# construct the xarray Dataset
dataset_to_export = xr.Dataset(coords=water_classification_percentages.coords, attrs=ts_water_classification.attrs)
# add the water classification percentages to the new xarray Dataset
dataset_to_export['wofs_pct'] = (water_classification_percentages/100).astype(np.float32)
The export command on the following line is commented out to avoid overwriting files. If you would like to export data, please change the filename before uncommenting the next line.
# CHANGE HERE >>>>>>>>>>>>>>>>>>>>>>>>>
# export_slice_to_geotiff(dataset_to_export, 'geotiffs/WOFS_Percentage_demo.tif')
By default, the files have been saved in the geotiffs
folder, which sits inside the dcal
folder that this notebook is stored in. Use the following cell to list the contents of the geotiffs
folder.
NOTE: Starting a command with !
allows you to run that command in the Jupyter environment's command line.
!ls -lah geotiffs/
total 12K drwxrwsr-x 2 jovyan users 4.0K Apr 9 01:57 . drwxrwsr-x 5 jovyan users 4.0K May 14 02:28 .. -rw-rw-r-- 1 jovyan users 38 Mar 31 23:41 README.md