WUR GRS-51806 Geoscripting

Week 3 (Python): Raster handling with Python

Python modules for raster data handling

The modules used below are:

  • gdal: bindings to the GDAL library which is part of the osgeo library
  • numpy for array calculations.

These two libraries are the base of all raster processing in python!

There are additional libraries that offer wrapper functions for gdal and provide additional raster procssesing functionality:

  • mapbox rasterio, see a NDVI tutorial
  • rios, a set of Python modules which makes it easy to write raster processing code in Python

Many important processing methods are availbale through additional libraries:

  • python interface for Orfeo toolbox (otb), e.g. segmentation
  • rsgislib e.g. segmentation
  • scikit-image for image interpretation
  • scikit-learn for machine learning


In the below example we will use a Aster image which you can download from the following dropbox. The data is also available here.

Reading, deriving NDVI, and writing raster data

Let's get started and we will go through the following steps:

  • Open the Aster image (ERDAS *.img format)
  • Read in the image data as an array
  • Derive the NDVI using array calculations
  • Write the resulting file as a *.tif (i.e. GeoTIFF file)
  • Close all files
In [1]:
# import modules
from osgeo import gdal
from osgeo.gdalconst import GA_ReadOnly, GDT_Float32
import numpy as np

# open file and print info about the file
# the "../" refers to the parent directory of my working directory

filename = '../data/ospy_data5/aster.img'
dataSource = gdal.Open(filename, GA_ReadOnly)

print "\nInformation about " + filename 
print "Driver: ", dataSource.GetDriver().ShortName,"/", \
print "Size is ",dataSource.RasterXSize,"x",dataSource.RasterYSize, \

print '\nProjection is: ', dataSource.GetProjection()

print "\nInformation about the location of the image and the pixel size:"
geotransform = dataSource.GetGeoTransform()
if not geotransform is None:
    print 'Origin = (',geotransform[0], ',',geotransform[3],')'
    print 'Pixel Size = (',geotransform[1], ',',geotransform[5],')'
Information about ../data/ospy_data5/aster.img
Driver:  HFA / Erdas Imagine Images (.img)
Size is  5665 x 5033 x 3

Projection is:  PROJCS["UTM Zone 12, Northern Hemisphere",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],TOWGS84[0,0,0,0,0,0,0],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-111],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["Meter",1],AUTHORITY["EPSG","32612"]]

Information about the location of the image and the pixel size:
Origin = ( 419976.5 , 4662422.5 )
Pixel Size = ( 15.0 , -15.0 )
In [2]:
# Read data into an array
band2Arr = dataSource.GetRasterBand(2).ReadAsArray(0,0,dataSource.RasterXSize, dataSource.RasterYSize)
band3Arr = dataSource.GetRasterBand(3).ReadAsArray(0,0,dataSource.RasterXSize, dataSource.RasterYSize)
print type(band2Arr)

# set the data type

# Derive the NDVI
mask = np.greater(band3Arr+band2Arr,0)

# set np.errstate to avoid warning of invalid values (i.e. NaN values) in the divide 
with np.errstate(invalid='ignore'):
    ndvi = np.choose(mask,(-99,(band3Arr-band2Arr)/(band3Arr+band2Arr)))
print "NDVI min and max values", ndvi.min(), ndvi.max()
# Check the real minimum value
print ndvi[ndvi>-99].min()
<type 'numpy.ndarray'>
NDVI min and max values -99.0 1.0
In [3]:
# Write the result to disk
driver = gdal.GetDriverByName('GTiff')
outDataSet=driver.Create('../data/ndvi.tif', dataSource.RasterXSize, dataSource.RasterYSize, 1, GDT_Float32)
outBand = outDataSet.GetRasterBand(1)

# set the projection and extent information of the dataset

# Finally let's save it... or like in the OGR example flush it

Let's check the resulting file via Bash (all commands in the current document with a ! in front are run via the system shell, which is Bash). Below we call the gdalinfo command of the GDAL C++ library directly. If you want to learn more about gdalinfo go to: http://www.gdal.org/gdalinfo.html

In [4]:
!gdalinfo ../data/ndvi.tif
Driver: GTiff/GeoTIFF
Files: ../data/ndvi.tif
Size is 5665, 5033
Coordinate System is:
PROJCS["WGS 84 / UTM zone 12N",
    GEOGCS["WGS 84",
            SPHEROID["WGS 84",6378137,298.257223563,
Origin = (419976.500000000000000,4662422.500000000000000)
Pixel Size = (15.000000000000000,-15.000000000000000)
Image Structure Metadata:
Corner Coordinates:
Upper Left  (  419976.500, 4662422.500) (111d58' 4.52"W, 42d 6'35.34"N)
Lower Left  (  419976.500, 4586927.500) (111d57'27.92"W, 41d25'47.74"N)
Upper Right (  504951.500, 4662422.500) (110d56'24.38"W, 42d 6'49.98"N)
Lower Right (  504951.500, 4586927.500) (110d56'26.64"W, 41d26' 2.04"N)
Center      (  462464.000, 4624675.000) (111d27' 5.89"W, 41d46'22.91"N)
Band 1 Block=5665x1 Type=Float32, ColorInterp=Gray
  NoData Value=-99

Important here is that the gdalinfo contains information about the projection (via SetProjection) and also about the corner coordinates (via SetGeoTransform).

Reproject the NDVI image to Lat/long

Reprojecting a raster image can be done by

  • calling the gdalwarp function of the GDAL C++ library directly
  • using Python and the gdal module

Using gdalwarp

The easiest way to reproject a raster file is to use GDAL's gdalwarp tool. As an example, we will reproject the above NDVI image derived earlier into latitude/longitude (WGS84).

  • -t_srs "EPSG:4326" is the CRS to reproject to, i.e. lat/long, know by its EPSG code. The CRS to reproject from is already specified in the source data set, see above.
  • data/ndvi.tif is the input dataset.
  • data/ndvi_ll.tif is the output dataset (the output format is here automatically set by the extension i.e. GeoTIFF).

Question 1: What is the CRS the image will be reprojected from?

In [5]:
# via the Shell
!gdalwarp -t_srs "EPSG:4326" ../data/ndvi.tif ../data/ndvi_ll.tif
Creating output file that is 6334P x 4215L.
Processing input file ../data/ndvi.tif.
Using internal nodata values (e.g. -99) for image ../data/ndvi.tif.
Copying nodata values from source ../data/ndvi.tif to destination ../data/ndvi_ll.tif.
0...10...20...30...40...50...60...70...80...90...100 - done.
In [6]:
# Let's check what the result is
!gdalinfo ../data/ndvi_ll.tif
Driver: GTiff/GeoTIFF
Files: ../data/ndvi_ll.tif
Size is 6334, 4215
Coordinate System is:
        SPHEROID["WGS 84",6378137,298.257223563,
Origin = (-111.967923047226890,42.113899650752344)
Pixel Size = (0.000162266608994,-0.000162266608994)
Image Structure Metadata:
Corner Coordinates:
Upper Left  (-111.9679230,  42.1138997) (111d58' 4.52"W, 42d 6'50.04"N)
Lower Left  (-111.9679230,  41.4299459) (111d58' 4.52"W, 41d25'47.81"N)
Upper Right (-110.9401263,  42.1138997) (110d56'24.45"W, 42d 6'50.04"N)
Lower Right (-110.9401263,  41.4299459) (110d56'24.45"W, 41d25'47.81"N)
Center      (-111.4540247,  41.7719228) (111d27'14.49"W, 41d46'18.92"N)
Band 1 Block=6334x1 Type=Float32, ColorInterp=Gray
  NoData Value=-99

To reproject a raster from within Python is not as straight-forward... Have a look at the following links, and use the internet to find a way!

Visualise the result

Let's see the output of the reprojected NDVI image.

In [7]:
# Notebook magic to select the plotting method
# Change to inline to plot within this notebook
#%matplotlib inline 
%matplotlib inline
from osgeo import gdal
import matplotlib.pyplot as plt
In [8]:
# Open image
dsll = gdal.Open("../data/ndvi_ll.tif")

# Read raster data
ndvi = dsll.ReadAsArray(0, 0, dsll.RasterXSize, dsll.RasterYSize)

# Now plot the raster data using gist_earth palette
plt.imshow(ndvi, interpolation='nearest', vmin=0, cmap=plt.cm.gist_earth)

dsll = None

Check in R

We can quickly check what we have done with R (go now to your Rstudio, or R terminal) and try the following code:

a <- brick('../data/ospy_data5/aster.img')
plotRGB(a, 3, 2, 1, stretch='hist')
b <- raster('../data/ndvi.tif')
hist(b, 1, maxpixels=500, plot=TRUE)

Exercise: NDWI and reprojection in python

For this scripting challenge I have downloaded a Landsat image with all bands process to surface reflectance (Level 1T). You can download it from here. Unzip the file and you will see that it contains all the invidual bands.

Now, write a well-structured and documented script where you define a function to derive the Normalised difference water index (NDWI) and derive it from the landsat image and reproject the image to Lat/Long WGS84 (hint: use GDAL from Bash in your notebook).

NDWI = band 3 - band 5 / band 3 + band 5 (more info about NDWI)

A clean and well structured script is critical here.

Submit until 9:30 next morning. Create an IPython Notebook script, put it on GitHub and post the link on Blackboard!

In [ ]: