# coding: utf-8
# # Distance Based Statistical Method for Planar Point Patterns
#
# **Authors: Serge Rey and Wei Kang **
#
# ## Introduction
#
# Distance based methods for point patterns are of three types:
#
# * [Mean Nearest Neighbor Distance Statistics](#Mean-Nearest-Neighbor-Distance-Statistics)
# * [Nearest Neighbor Distance Functions](#Nearest-Neighbor-Distance-Functions)
# * [Interevent Distance Functions](#Interevent-Distance-Functions)
#
# In addition, we are going to introduce a computational technique [Simulation Envelopes](#Simulation-Envelopes) to aid in making inferences about the data generating process. An [example](#CSR-Example) is used to demonstrate how to use and interprete simulation envelopes.
# In[1]:
import scipy.spatial
import libpysal as ps
import numpy as np
from pointpats import PointPattern, PoissonPointProcess, as_window, G, F, J, K, L, Genv, Fenv, Jenv, Kenv, Lenv
get_ipython().run_line_magic('matplotlib', 'inline')
import matplotlib.pyplot as plt
# ## Mean Nearest Neighbor Distance Statistics
#
# The nearest neighbor(s) for a point $u$ is the point(s) $N(u)$ which meet the condition
# $$d_{u,N(u)} \leq d_{u,j} \forall j \in S - u$$
#
# The distance between the nearest neighbor(s) $N(u)$ and the point $u$ is nearest neighbor distance for $u$. After searching for nearest neighbor(s) for all the points and calculating the corresponding distances, we are able to calculate mean nearest neighbor distance by averaging these distances.
#
# It was demonstrated by Clark and Evans(1954) that mean nearest neighbor distance statistics distribution is a normal distribution under null hypothesis (underlying spatial process is CSR). We can utilize the test statistics to determine whether the point pattern is the outcome of CSR. If not, is it the outcome of cluster or regular
# spatial process?
#
# Mean nearest neighbor distance statistic
#
# $$\bar{d}_{min}=\frac{1}{n} \sum_{i=1}^n d_{min}(s_i)$$
# In[2]:
points = [[66.22, 32.54], [22.52, 22.39], [31.01, 81.21],
[9.47, 31.02], [30.78, 60.10], [75.21, 58.93],
[79.26, 7.68], [8.23, 39.93], [98.73, 77.17],
[89.78, 42.53], [65.19, 92.08], [54.46, 8.48]]
pp = PointPattern(points)
# In[3]:
pp.summary()
# We may call the method **knn** in PointPattern class to find $k$ nearest neighbors for each point in the point pattern *pp*.
# In[4]:
# one nearest neighbor (default)
pp.knn()
# The first array is the ids of the most nearest neighbor for each point, the second array is the distance between each point and its most nearest neighbor.
# In[5]:
# two nearest neighbors
pp.knn(2)
# In[6]:
pp.max_nnd # Maximum nearest neighbor distance
# In[7]:
pp.min_nnd # Minimum nearest neighbor distance
# In[8]:
pp.mean_nnd # mean nearest neighbor distance
# In[9]:
pp.nnd # Nearest neighbor distances
# In[10]:
pp.nnd.sum()/pp.n # same as pp.mean_nnd
# In[11]:
pp.plot()
# ## Nearest Neighbor Distance Functions
#
# Nearest neighbour distance distribution functions (including the nearest “event-to-event” and “point-event” distance distribution functions) of a point process are cumulative distribution functions of several kinds -- $G, F, J$. By comparing the distance function of the observed point pattern with that of the point pattern from a CSR process, we are able to infer whether the underlying spatial process of the observed point pattern is CSR or not for a given confidence level.
# #### $G$ function - event-to-event
#
# The $G$ function is defined as follows: for a given distance $d$, $G(d)$ is the proportion of nearest neighbor distances that are less than $d$.
# $$G(d) = \sum_{i=1}^n \frac{ \phi_i^d}{n}$$
#
# $$
# \phi_i^d =
# \begin{cases}
# 1 & \quad \text{if } d_{min}(s_i)\pi d^2$ indicates that the underlying point process is a cluster point process.
# In[21]:
kp1 = K(pp)
kp1.plot()
# #### $L$ function - "interevent"
#
# $L$ function is a scaled version of $K$ function, defined as:
# $$L(d) = \sqrt{\frac{K(d)}{\pi}}-d$$
# In[22]:
lp1 = L(pp)
lp1.plot()
# ## Simulation Envelopes
#
# A [Simulation envelope](http://www.esajournals.org/doi/pdf/10.1890/13-2042.1) is a computer intensive technique for inferring whether an observed pattern significantly deviates from what would be expected under a specific process. Here, we always use CSR as the benchmark. In order to construct a simulation envelope for a given function, we need to simulate CSR a lot of times, say $1000$ times. Then, we can calculate the function for each simulated point pattern. For every distance $d$, we sort the function values of the $1000$ simulated point patterns. Given a confidence level, say $95\%$, we can acquire the $25$th and $975$th value for every distance $d$. Thus, a simulation envelope is constructed.
# #### Simulation Envelope for G function
#
# **Genv** class in pysal.
# In[23]:
realizations = PoissonPointProcess(pp.window, pp.n, 100, asPP=True) # simulate CSR 100 times
genv = Genv(pp, intervals=20, realizations=realizations) # call Genv to generate simulation envelope
genv
# In[24]:
genv.observed
# In[25]:
genv.plot()
# In the above figure, **LB** and **UB** comprise the simulation envelope. **CSR** is the mean function calculated from the simulated data. **G** is the function estimated from the observed point pattern. It is well below the simulation envelope. We can infer that the underlying point process is a regular one.
# #### Simulation Envelope for F function
#
# **Fenv** class in pysal.
# In[26]:
fenv = Fenv(pp, intervals=20, realizations=realizations)
fenv.plot()
# #### Simulation Envelope for J function
#
# **Jenv** class in pysal.
# In[27]:
jenv = Jenv(pp, intervals=20, realizations=realizations)
jenv.plot()
# #### Simulation Envelope for K function
#
# **Kenv** class in pysal.
# In[28]:
kenv = Kenv(pp, intervals=20, realizations=realizations)
kenv.plot()
# #### Simulation Envelope for L function
#
# **Lenv** class in pysal.
# In[29]:
lenv = Lenv(pp, intervals=20, realizations=realizations)
lenv.plot()
# ## CSR Example
# In this example, we are going to generate a point pattern as the "observed" point pattern. The data generating process is CSR. Then, we will simulate CSR in the same domain for 100 times and construct a simulation envelope for each function.
# In[30]:
from libpysal.cg import shapely_ext
from pointpats import Window
import libpysal as ps
va = ps.io.open(ps.examples.get_path("vautm17n.shp"))
polys = [shp for shp in va]
state = shapely_ext.cascaded_union(polys)
# Generate the point pattern **pp** (size 100) from CSR as the "observed" point pattern.
# In[31]:
a = [[1],[1,2]]
np.asarray(a)
# In[32]:
n = 100
samples = 1
pp = PoissonPointProcess(Window(state.parts), n, samples, asPP=True)
pp.realizations[0]
# In[33]:
pp.n
# Simulate CSR in the same domian for 100 times which would be used for constructing simulation envelope under the null hypothesis of CSR.
# In[34]:
csrs = PoissonPointProcess(pp.window, 100, 100, asPP=True)
csrs
# Construct the simulation envelope for $G$ function.
# In[35]:
genv = Genv(pp.realizations[0], realizations=csrs)
genv.plot()
# Since the "observed" $G$ is well contained by the simulation envelope, we infer that the underlying point process is a random process.
# In[36]:
genv.low # lower bound of the simulation envelope for G
# In[37]:
genv.high # higher bound of the simulation envelope for G
# Construct the simulation envelope for $F$ function.
# In[38]:
fenv = Fenv(pp.realizations[0], realizations=csrs)
fenv.plot()
# Construct the simulation envelope for $J$ function.
# In[39]:
jenv = Jenv(pp.realizations[0], realizations=csrs)
jenv.plot()
# Construct the simulation envelope for $K$ function.
# In[40]:
kenv = Kenv(pp.realizations[0], realizations=csrs)
kenv.plot()
# Construct the simulation envelope for $L$ function.
# In[41]:
lenv = Lenv(pp.realizations[0], realizations=csrs)
lenv.plot()