In [1]:
using Pkg; Pkg.instantiate(); Pkg.precompile()
Precompiling project...
In [2]:
using Random; Random.seed!(2020);

How much Gold there is left?

We will pretend that we are a mining company interested in extracting a mineral resource such as Gold (Au) from a mine $\mathcal{D}$. Based on the fact that we are trying to maximize profit, we end up collecting samples $\mathcal{S} = \{Au(x_1),Au(x_2),\ldots,Au(x_N)\}$ at nearby locations $x_1,x_2,\ldots,x_N$ where we believe we will find the mineral we are after.

Our data acquisition process suffers from sampling bias: most of our samples reflect high concentrations of Gold.

Problem statement:

Estimate the remaining amount of Gold in the mine $\mathcal{D}$ from the spatial samples $\mathcal{S}$.

Can we do any better than multiplying the sample average by the volume of the mine?

$$\underbrace{\left(\frac{1}{N}\sum_{i=1}^{N} Au(x_i)\right)}_{\mu_\mathcal{S}:\text{ sample average}} \times \mathcal{V}(\mathcal{D})$$
In [3]:
using GeoStats
using GeoStatsImages
using Plots; gr(c=:cividis)

Random.seed!(2020)

Au = geostatsimage("WalkerLakeTruth")[:prop]

š’Ÿ = RegularGridData(OrderedDict(:Au=>Au))

š’® = sample(š’Ÿ, 50, vec(š’Ÿ[:Au]), replace=false)

plot(plot(š’Ÿ), plot(š’®), size=(900,400))
Out[3]:

The mean value of Gold in the mine is:

In [4]:
Ī¼š’Ÿ = mean(š’Ÿ[:Au])
Out[4]:
0.5

whereas the sample average is much higher:

In [5]:
Ī¼š’® = mean(š’®[:Au])
Out[5]:
0.6477059962307209

Spatial declustering

Notice that besides suffering from sampling bias, our sampling process leads to samples that are clustered in space. To quantify this observation, let's partition $\mathcal{S}$ into blocks $\mathcal{B}_1,\mathcal{B}_2,\ldots,\mathcal{B}_M$ of size $b^2$ and count the number of samples that share a block:

In [6]:
ā„¬ = partition(š’®, BlockPartitioner(50.,50.))

pā‚ = plot(ā„¬, colorbar=false, xlabel="x", ylabel="y")

pā‚‚ = bar(npoints.(ā„¬), xlabel="block", ylabel="counts", legend=false)

plot(pā‚, pā‚‚, size=(900,400))
Out[6]:

Samples that are close to each other are redundant, and shouldn't receive the same "importance" in the mean estimate compared with isolated samples. We can use the block counts $|\mathcal{B}_j|$ to assign importance weights $w_b(x_i) = \frac{1}{|\mathcal{B}_j|}$ to the samples $Au(x_i)$ based on their locations $x_i \in \mathcal{B}_j$:

In [7]:
š’² = weight(š’®, BlockWeighter(50.,50.))

plot(š’², c=:Oranges)
Out[7]:

These weights $w_b(x_i)$, which are a function of the block size $b^2$, can be used in a weighted average

$$\mu_\mathcal{B} = \frac{\sum_{i=1}^N w_b(x_i) Au(x_i)}{\sum_{i=1}^N w_b(x_i)}$$

that generalizes the sample average $\mu_\mathcal{S} = \lim_{b\to 0} \mu_\mathcal{B}$.

We can plot the weighted average for increasing block sizes to notice that the sample average (i.e. uniform weights) is recovered when the block size is too small (each sample is its own block), or when the block size is too large (all samples are in a single block):

In [8]:
bs = range(1, stop=120, length=100)
Ī¼s = [mean(š’®, :Au, b) for b in bs]

plot(xlabel="block size", ylabel="mean estimate", legend=:bottomright)
plot!(bs, Ī¼s, c=:green, label="weighted average")
hline!([Ī¼š’®], c=:red, ls=:dash, label="sample average")
hline!([Ī¼š’Ÿ], c=:black, ls=:dash, label="true average")
Out[8]:

In case the block size is ommited, GeoStats.jl uses a heuristic to select a "reasonable" block size for the given spatial configuration:

In [9]:
Ī¼š’® = mean(š’®[:Au])
Ī¼ā„¬ = mean(š’®, :Au)

println("Sample average   ā†’ ", Ī¼š’®)
println("Weighted average ā†’ ", Ī¼ā„¬)
println("True average     ā†’ ", Ī¼š’Ÿ)
Sample average   ā†’ 0.6477059962307209
Weighted average ā†’ 0.5781550359878287
True average     ā†’ 0.5

We can compare the difference, in volume of Gold, between the two statistics:

In [10]:
š’± = volume(boundbox(š’®))

(Ī¼š’® - Ī¼ā„¬) * š’±
Out[10]:
4914.1926469217915

Declustered statistics

The idea of assigning importance weights to samples via spatial declustering is general, and holds for any statistic of interest. Hence, the term declustered statistics. To give another example, we can obtain better estimates of any quantile of the Gold distribution by considering the coordinates of the samples:

Non-spatial quantile

In [11]:
quantile(š’®[:Au], [0.25,0.50,0.75])
Out[11]:
3-element Array{Float64,1}:
 0.48499019218195105
 0.7015602764138003
 0.8265362376440724

Spatial quantile

In [12]:
quantile(š’®, :Au, [0.25,0.50,0.75])
Out[12]:
3-element Array{Float64,1}:
 0.3783833027445125
 0.5840784236597003
 0.7933555854266376

Remarks

  • Spatial samples can be weighted based on their coordinates to improve volumetric estimates of resources.
  • Spatial declustering is particularly useful in the presence of sampling bias and spatial correlation.
  • GeoStats.jl changes the semantics of statistics such as mean, var and quantile in a spatial context.