In this section, we will work with MODIS Vegetation Indices Version 6.1 data which are generated every 16 days at 250 meter (m) spatial resolution. This dataset provides Normalised Differnece Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) processed from the MODIS data. The source data was downloaded and processed via AρρEEARS which allows for pre-processing of the data into user-specified AOIs.
Here we will working with GeoTIFF files in EPSG:4326 projection and clipped to the state boundary of Karnataka, India. We will learn how to load all the files into a single XArray Dataset, apply a cloud-mask and extract a time-series of NDVI and EVI values.
The following blocks of code will install the required packages and download the datasets to your Colab environment.
%%capture
if 'google.colab' in str(get_ipython()):
!apt install libspatialindex-dev
!pip install fiona shapely pyproj rtree
!pip install geopandas
!pip install rioxarray
import datetime
import glob
import os
import re
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import geopandas as gpd
import xarray as xr
import rioxarray as rxr
import zipfile
data_folder = 'data'
output_folder = 'output'
if not os.path.exists(data_folder):
os.mkdir(data_folder)
if not os.path.exists(output_folder):
os.mkdir(output_folder)
def download(url):
filename = os.path.join(data_folder, os.path.basename(url))
if not os.path.exists(filename):
from urllib.request import urlretrieve
local, _ = urlretrieve(url, filename)
print('Downloaded ' + local)
filename = 'modis_vegetation_indices_2020.zip'
url = 'https://storage.googleapis.com/spatialthoughts-public-data/' + filename
download(url)
Downloaded data/modis_vegetation_indices_2020.zip
First we unzip and extract the images to a folder.
zipfile_path = os.path.join(data_folder, filename)
with zipfile.ZipFile(zipfile_path) as zf:
zf.extractall(data_folder)
The timestamp for the image is part of the filename. We write a function to parse the filename and extract the timestamps.
def path_to_datetimeindex(filepath):
filename = os.path.basename(filepath)
pattern = r'doy(\d+)'
match = re.search(pattern, filepath)
if match:
doy_value = match.group(1)
timestamp = datetime.datetime.strptime(doy_value, '%Y%j')
return timestamp
else:
print('Could not extract DOY from filename', filename)
timestamps = []
filepaths = []
files = os.path.join(data_folder, 'modis_vegetation_indices_2020', '*.tif')
for filepath in glob.glob(files):
timestamp = path_to_datetimeindex(filepath)
filepaths.append(filepath)
timestamps.append(timestamp)
unique_timestamps = sorted(set(timestamps))
Iterate through each file and read it using XArray. There are 3 files for each time-step. One file each for NDV
, EVI
and DetailedQA
information. We extract the first 2-bits of information from the DetailedQA
band into a new band named SummaryQA
. These 4 bands are combined into a single array for each time-step and then combine all the arrays into a single array with a time
dimension.
scenes = []
for timestamp in unique_timestamps:
ndvi_filepattern = r'NDVI_doy{}'.format(timestamp.strftime('%Y%j'))
evi_filepattern = r'EVI_doy{}'.format(timestamp.strftime('%Y%j'))
qa_filepattern = r'VI_Quality_doy{}'.format(timestamp.strftime('%Y%j'))
ndvi_filepath = [filepath for filepath in filepaths if re.search(ndvi_filepattern, filepath)][0]
evi_filepath = [filepath for filepath in filepaths if re.search(evi_filepattern, filepath)][0]
qa_filepath = [filepath for filepath in filepaths if re.search(qa_filepattern, filepath)][0]
ndvi_band = rxr.open_rasterio(ndvi_filepath, chunks={'x':512, 'y':512})
ndvi_band.name = 'NDVI'
evi_band = rxr.open_rasterio(evi_filepath, chunks={'x':512, 'y':512})
evi_band.name = 'EVI'
qa_band = rxr.open_rasterio(qa_filepath, chunks={'x':512, 'y':512})
qa_band.name = 'DetailedQA'
# First 2 bits are MODLAND QA Bits with summary information
# Extract the value of the 2-bits using bitwise AND operation
summary_qa = qa_band & 0b11
qa_band.name = 'SummaryQA'
# Pixel values need to be scaled with the scale_factor
scale_factor = 0.0001
scaled_bands = [ndvi_band * scale_factor, evi_band * scale_factor, qa_band, summary_qa]
scene = xr.concat(scaled_bands, dim='band')
scenes.append(scene)
time_var = xr.Variable('time', list(unique_timestamps))
time_series_scenes = xr.concat(scenes, dim=time_var)
time_series_scenes
time_series_scenes = time_series_scenes.assign_coords(band=['NDVI', 'EVI', 'DetailedQA', 'SummaryQA'])
time_series_scenes
As described in the MODIS VI User Guide MOD13Q1/A1 QA band contains pixel values representing overall pixel quality.
Value | Summary QA | Description |
---|---|---|
-1 | Fill/No Data | Not Processed |
0 | Good Data | Use with confidence |
1 | Marginal data | Useful, but look at other QA information |
2 | Snow/Ice | Target covered with snow/ice |
3 | Cloudy | Target not visible, covered with cloud |
We select all pixels with values 0 or 1 and mask all pixels with snow or cloud.
summary_qa = time_series_scenes.sel(band='SummaryQA')
time_series_scenes_masked = time_series_scenes.sel(band=['NDVI', 'EVI']).where(summary_qa <= 1)
time_series_scenes_masked
We extract the time-series at a specific X,Y coordinate.
time_series = time_series_scenes.sel(band=['NDVI', 'EVI']).interp(y=13.16, x=77.35)
This time-series has cloudy pixels which show up as anomalous NDVI and EVI values.
fig, ax = plt.subplots(1, 1)
fig.set_size_inches(15, 7)
time_series.plot.line(ax=ax, x='time', marker='o', linestyle='-', linewidth=1)
plt.show()
Let's plot the cloud-masked time-series.
time_series_masked = time_series_scenes_masked.sel(band=['NDVI', 'EVI']).interp(y=13.16, x=77.35)
fig, ax = plt.subplots(1, 1)
fig.set_size_inches(15, 7)
time_series_masked.plot.line(ax=ax, x='time', marker='o', linestyle='-', linewidth=1)
plt.show()
This looks much better, but we have gaps in the time-series due to masked cloudy-pixels. Replace them with linearly interpolated values.
time_series_masked_interpolated = time_series_masked.chunk(dict(time=-1)).interpolate_na('time')
fig, ax = plt.subplots(1, 1)
fig.set_size_inches(15, 7)
time_series_masked_interpolated.plot.line(ax=ax, x='time', marker='o', linestyle='-', linewidth=1)
plt.show()
We can set the series colors to our choice of colors using Matplotlib Cycler.
from cycler import cycler
import matplotlib as mpl
colorlist = ['#006837', '#78c679']
custom_cycler = cycler(color=colorlist)
fig, ax = plt.subplots(1, 1)
ax.set_prop_cycle(custom_cycler)
fig.set_size_inches(15, 7)
time_series_masked_interpolated.plot.line(ax=ax, x='time', marker='o', linestyle='-', linewidth=1)
plt.show()
We can use to_pandas()
method to convert the DataArray into a Pandas DataFrame. The masked pixels are NaN values that we can fill with a NoData value of -9999.
df = time_series_masked.to_pandas()
df = df.fillna(-9999)
df
Save the DataFrame as a CSV file.
output_filename = 'original_time_series.csv'
output_filepath = os.path.join(output_folder, output_filename)
df.reset_index().to_csv(output_filepath, index=False)
Create another CSV file interpolated_time_series.csv
containing the linearly interpolated NDVI and EVI values.