A key goal of the Integrated Monitoring Program (IMP) is to combine information on specific environmental varibles so as to maximise the state of knowledge relating to reef health. In addition, it is important that this information be presented spatially, with uncertainties clearly represented, so that gaps or overlaps in monitoring can be identified.
Here, as a case example, we implement a Gaussian Process (GP) model in space and time that accomplishes both both goals for total nitrogen (TN) - combining information from multiple sources in space and time. General details on GP implementation in PyMC are available here.
The first step in this process is to import the two datasets that contain TSS information, namely from The Centre for Tropical Water & Aquatic Ecosystem Research at James Cook University (TropWater) and from the Great Barrier Reef Marine Park Authorty's Marine Monitoring Program (MMP).
Figure 1: IMP data for Mackay region. IMP data near Mackay, QLD (black triangle) including AIMS Long Term Monitoring Program (blue dots), TropWater monitoring (pink dots); and GBRMPA MMP (red dots) for the years 2005-2013.
# Import packages
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pymc as pm
from pymc.gp.cov_funs import matern
import pymc.gp as gp
# Import MMP data
mmp = pd.read_csv('mmp.csv')
# Import TropWater data
tropwater = pd.read_csv('tropwater.csv')
Water quality is typically reported on a quarterly bassis, with the water quality year running October to October each year. Although we wished to look at Q2 (January to March), which corresponds to the rainy season and the bulk of the TropWater data, MMP data is from Q3 (April to June) or Q4 (July to September), obfuscating any meaninful comparison for reporting quarters. Furthermore, because these data are so sparse - there is a maximum of 79 observations in a given year (TropWater in 2007) - we have to make the grossly simplifying assumption that all observations of TN between 2005 and 2013 are measuring the same thing.
With this in mind, we can import the corrdinates of each observation and the observed TN value:
mmpTN = np.array([mmp.Total_Nitrogen.values,mmp.LATITUDE.values,mmp.LONGITUDE.values]).T
tropwaterTN = np.array([tropwater.TN_uM.values,tropwater.LATITUDE.values,tropwater.LONGITUDE.values]).T
And then concatenate them into a single set of TN and coordinate data for the GP model:
TN = np.r_[mmpTN,tropwaterTN]
Spatial GP models without covariates are primiarily defined by their covariance structure and here we will use the flexible Matern covariance function, which has three underlying parameters: amplitude, governing the peakedness of the response surface; scale, governing the relatedness of adjacent datapoints; and difference degree, governing the level of smoothing over the GP surface.
First the difference degree prior, for which we choose arbitrarily wide Uniform priors, relying on posterior checks to ensure we're not unduly limiting the inference space:
diff_degree = pm.Uniform('diff_degree',0.1,4,value=1)
Next the scale parameter, for which we will take a similarly broad approach, using a Uniform prior:
scale = pm.Uniform('scale',0,10,value=1.0)
Additionally the amplitude parameter, the prior defined as a broad Exponential:
amp = pm.Exponential('amp',1,value=1)
Combined together, these hyperparameters give rise to the prior covariance:
# Prior covariance function
@pm.deterministic
def C(amp=amp, scale=scale, diff_degree=diff_degree):
return gp.Covariance(matern.euclidean,
diff_degree = diff_degree,
amp = amp, scale = scale)
With the covariance structure in place, the next step is to define the mean function, which in this case is just the overall mean:
# Overall mean
mu = pm.Normal('mu', 0, 1e-5, value=0)
# Mean function, which is constant
def constant(x, val):
return np.zeros(x.shape[:-1], dtype=float) + val
# Wrap mean function for GP submodel
@pm.deterministic
def M(mu=mu):
return gp.Mean(constant, val=mu)
To set up the GP model we need to define the coordinates over which our observations will be defined, called the mesh. In our case, this is the unique lat/lon coordinates of the observations:
# Lat/lon for observations
obs_coords = TN.T[1:].T
# Unique lat/lon for points on the mesh
mesh = np.array(list(set([tuple(x) for x in obs_coords])))
Next we combine the GP prior covariance function and mean function into the GP prior:
# GP prior
TN_prior = gp.GPSubmodel('TN_prior', M, C, mesh)
Next we define the observation model, given a value prior standard deviation and associated precision:
# Observation SD prior
sd_obs = pm.Uniform('sd_obs', 0, 1000, value=2)
tau_obs = pm.Lambda('tau_obs', lambda sd=sd_obs: sd**-2)
Finally the observation model itself, with the GP function for the mean and the observed lat/lon coordinates for each observation of TN:
Yi = pm.Normal('Yi', mu=TN_prior.f(obs_coords), tau=tau_obs, value=TN.T[0], observed=True)
M = pm.MCMC(locals())
%pdb
Automatic pdb calling has been turned ON
M.sample(10000, 5000)
[-----------------100%-----------------] 10000 of 10000 complete in 45.8 sec
pm.Matplot.summary_plot([M.amp, M.diff_degree, M.scale])
Could not calculate Gelman-Rubin statistics. Requires multiple chains of equal length.
n = 5000
m = 100
xmin, ymin = TN[:, 1:].min(0)
xmax, ymax = TN[:, 1:].max(0)
xplot = np.linspace(xmin, xmax, m)
yplot = np.linspace(ymin, ymax, m)
dplot = np.dstack(np.meshgrid(xplot, yplot))
TN[:, 1:].min(0)
array([ -22.468633, 148.69479 ])
Msurf = np.zeros(dplot.shape[:2])
E2surf = np.zeros(dplot.shape[:2])
from pymc.gp import point_eval
# Get E[v] and E[v**2] over the entire posterior
for i in xrange(n):
# Reset all variables to their values at frame i of the trace
M.remember(0, i)
# Evaluate the observed mean
Msurf_i, Vsurf_i = point_eval(M.TN_prior.M_obs.value, M.TN_prior.C_obs.value, dplot)
Msurf += Msurf_i/n
# Evaluate the observed covariance with one argument
E2surf += (Vsurf_i + Msurf_i**2)/n
# Get the posterior variance and standard deviation
Vsurf = E2surf - Msurf**2
SDsurf = np.sqrt(Vsurf)
TN[:, 1][::-1]
array([-22.468633 , -22.393917 , -22.321133 , -22.2509 , -21.650667 , -21.558117 , -21.486233 , -21.4343 , -21.385383 , -21.321117 , -21.321117 , -21.275633 , -21.232067 , -21.173367 , -21.163617 , -21.15721 , -21.13642 , -21.01979 , -20.93864 , -20.86126 , -20.84345 , -20.79908 , -20.79589 , -20.76852167, -20.69046667, -20.68393 , -20.65378 , -20.62313 , -20.62313 , -20.60684 , -20.60684 , -20.59322 , -20.59190833, -20.59190833, -20.59055 , -20.57766 , -20.57646 , -20.53176 , -20.51947 , -20.50137 , -20.50137 , -20.4646667 , -20.4646667 , -20.41566 , -20.3826667 , -20.3826667 , -20.3785 , -20.3785 , -20.3785 , -20.37817 , -20.3778333 , -20.3778333 , -20.3778333 , -20.312 , -20.30319833, -20.26761 , -20.2516667 , -20.2516667 , -20.2516667 , -20.2466667 , -20.2466667 , -20.2466667 , -20.23743 , -20.1639167 , -20.1639167 , -20.1603 , -20.1073333 , -20.1073333 , -20.1073333 , -20.30173333, -20.30173333, -20.30173333, -20.30173333, -20.46845 , -20.46845 , -20.46845 , -20.46845 , -20.46845 , -20.46845 , -20.46845 , -20.46845 , -20.37798333, -20.37798333, -20.37798333, -20.37798333, -20.37798333, -20.37798333, -20.37798333, -20.37798333, -20.37798333, -20.37798333, -20.37798333, -20.37798333, -20.16811667, -20.16811667, -20.16811667, -20.16811667, -20.16811667, -20.16811667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.34498333, -20.34498333, -20.34498333, -20.34498333, -20.34498333, -20.34498333, -20.34498333, -20.34498333, -20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.25561667])
TN[:, 1]
array([-20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.25561667, -20.34498333, -20.34498333, -20.34498333, -20.34498333, -20.34498333, -20.34498333, -20.34498333, -20.34498333, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.10481667, -20.16811667, -20.16811667, -20.16811667, -20.16811667, -20.16811667, -20.16811667, -20.37798333, -20.37798333, -20.37798333, -20.37798333, -20.37798333, -20.37798333, -20.37798333, -20.37798333, -20.37798333, -20.37798333, -20.37798333, -20.37798333, -20.46845 , -20.46845 , -20.46845 , -20.46845 , -20.46845 , -20.46845 , -20.46845 , -20.46845 , -20.30173333, -20.30173333, -20.30173333, -20.30173333, -20.1073333 , -20.1073333 , -20.1073333 , -20.1603 , -20.1639167 , -20.1639167 , -20.23743 , -20.2466667 , -20.2466667 , -20.2466667 , -20.2516667 , -20.2516667 , -20.2516667 , -20.26761 , -20.30319833, -20.312 , -20.3778333 , -20.3778333 , -20.3778333 , -20.37817 , -20.3785 , -20.3785 , -20.3785 , -20.3826667 , -20.3826667 , -20.41566 , -20.4646667 , -20.4646667 , -20.50137 , -20.50137 , -20.51947 , -20.53176 , -20.57646 , -20.57766 , -20.59055 , -20.59190833, -20.59190833, -20.59322 , -20.60684 , -20.60684 , -20.62313 , -20.62313 , -20.65378 , -20.68393 , -20.69046667, -20.76852167, -20.79589 , -20.79908 , -20.84345 , -20.86126 , -20.93864 , -21.01979 , -21.13642 , -21.15721 , -21.163617 , -21.173367 , -21.232067 , -21.275633 , -21.321117 , -21.321117 , -21.385383 , -21.4343 , -21.486233 , -21.558117 , -21.650667 , -22.2509 , -22.321133 , -22.393917 , -22.468633 ])
plt.imshow?
ymin, ymax, xmax, xmin
(148.69478999999998, 150.49254999999999, -20.1048166666667, -22.468632999999997)
Msurf.shape
(100, 100)
# Plot mean and standard deviation surfaces
plt.imshow(Msurf.T, extent=[ymin, ymax, xmin, xmax],interpolation='nearest', origin='lower')
plt.plot(TN[:, 2], TN[:, 1], 'r.', markersize=4)
plt.axis([ymin, ymax, xmin, xmax])
plt.title('Posterior predictive mean surface')
plt.colorbar()
<matplotlib.colorbar.Colorbar instance at 0x114eb0950>
plt.imshow(SDsurf.T, extent=[ymin, ymax, xmin, xmax],interpolation='nearest', origin='lower')
plt.plot(TN[:, 2], TN[:, 1],'r.',markersize=4)
plt.axis([ymin, ymax, xmin, xmax])
plt.title('Posterior predictive standard deviation surface')
plt.colorbar()
<matplotlib.colorbar.Colorbar instance at 0x1151a76c8>