In [1]:

```
using Pkg; Pkg.instantiate(); Pkg.precompile()
```

In [2]:

```
using Random; Random.seed!(2020);
```

We will pretend that we are a mining company interested in extracting a mineral resource such as Gold (Au) from a mine $\mathcal{D}$. Based on the fact that we are trying to maximize profit, we end up collecting samples $\mathcal{S} = \{Au(x_1),Au(x_2),\ldots,Au(x_N)\}$ at nearby locations $x_1,x_2,\ldots,x_N$ where we believe we will find the mineral we are after.

Our data acquisition process suffers from *sampling bias*: most of our samples reflect high concentrations of Gold.

**Problem statement**:

Estimate the remaining amount of Gold in the mine $\mathcal{D}$ from the spatial samples $\mathcal{S}$.

*Can we do any better than multiplying the sample average by the volume of the mine?*

In [3]:

```
using GeoStats
using GeoStatsImages
using Plots; gr(c=:cividis)
Random.seed!(2020)
Au = geostatsimage("WalkerLakeTruth")[:prop]
𝒟 = RegularGridData(OrderedDict(:Au=>Au))
𝒮 = sample(𝒟, 50, vec(𝒟[:Au]), replace=false)
plot(plot(𝒟), plot(𝒮), size=(900,400))
```

Out[3]:

The mean value of Gold in the mine is:

In [4]:

```
μ𝒟 = mean(𝒟[:Au])
```

Out[4]:

whereas the sample average is much higher:

In [5]:

```
μ𝒮 = mean(𝒮[:Au])
```

Out[5]:

Notice that besides suffering from sampling bias, our sampling process leads to samples that are *clustered in space*. To quantify this observation, let's partition $\mathcal{S}$ into blocks $\mathcal{B}_1,\mathcal{B}_2,\ldots,\mathcal{B}_M$ of size $b^2$ and count the number of samples that share a block:

In [6]:

```
ℬ = partition(𝒮, BlockPartitioner(50.,50.))
p₁ = plot(ℬ, colorbar=false, xlabel="x", ylabel="y")
p₂ = bar(npoints.(ℬ), xlabel="block", ylabel="counts", legend=false)
plot(p₁, p₂, size=(900,400))
```

Out[6]:

Samples that are close to each other are redundant, and shouldn't receive the same "importance" in the mean estimate compared with isolated samples. We can use the block counts $|\mathcal{B}_j|$ to assign importance weights $w_b(x_i) = \frac{1}{|\mathcal{B}_j|}$ to the samples $Au(x_i)$ based on their locations $x_i \in \mathcal{B}_j$:

In [7]:

```
𝒲 = weight(𝒮, BlockWeighter(50.,50.))
plot(𝒲, c=:Oranges)
```

Out[7]:

These weights $w_b(x_i)$, which are a function of the block size $b^2$, can be used in a weighted average

$$\mu_\mathcal{B} = \frac{\sum_{i=1}^N w_b(x_i) Au(x_i)}{\sum_{i=1}^N w_b(x_i)}$$that generalizes the sample average $\mu_\mathcal{S} = \lim_{b\to 0} \mu_\mathcal{B}$.

We can plot the weighted average for increasing block sizes to notice that the sample average (i.e. uniform weights) is recovered when the block size is too small (each sample is its own block), or when the block size is too large (all samples are in a single block):

In [8]:

```
bs = range(1, stop=120, length=100)
μs = [mean(𝒮, :Au, b) for b in bs]
plot(xlabel="block size", ylabel="mean estimate", legend=:bottomright)
plot!(bs, μs, c=:green, label="weighted average")
hline!([μ𝒮], c=:red, ls=:dash, label="sample average")
hline!([μ𝒟], c=:black, ls=:dash, label="true average")
```

Out[8]:

In case the block size is ommited, GeoStats.jl uses a heuristic to select a "reasonable" block size for the given spatial configuration:

In [9]:

```
μ𝒮 = mean(𝒮[:Au])
μℬ = mean(𝒮, :Au)
println("Sample average → ", μ𝒮)
println("Weighted average → ", μℬ)
println("True average → ", μ𝒟)
```

We can compare the difference, in volume of Gold, between the two statistics:

In [10]:

```
𝒱 = volume(boundbox(𝒮))
(μ𝒮 - μℬ) * 𝒱
```

Out[10]:

The idea of assigning importance weights to samples via spatial declustering is general, and holds for any statistic of interest. Hence, the term *declustered statistics*. To give another example, we can obtain better estimates of any quantile of the Gold distribution by considering the coordinates of the samples:

In [11]:

```
quantile(𝒮[:Au], [0.25,0.50,0.75])
```

Out[11]:

In [12]:

```
quantile(𝒮, :Au, [0.25,0.50,0.75])
```

Out[12]:

- Spatial samples can be weighted based on their coordinates to improve volumetric estimates of resources.
- Spatial declustering is particularly useful in the presence of sampling bias and spatial correlation.
- GeoStats.jl changes the semantics of statistics such as
`mean`

,`var`

and`quantile`

in a spatial context.