#!/usr/bin/env python
# coding: utf-8

# # LISA cluster maps with `PySAL`

# * Dani Arribas-Bel ([@darribas](http://twitter.com/darribas))
# 
# This notebook is available as a [gist](https://gist.github.com/darribas/657e0568df7a63362762), or viewable online as [static html](http://nbviewer.ipython.org/urls/gist.githubusercontent.com/darribas/657e0568df7a63362762/raw/pysal_lisa_maps.ipynb).
# 
# ----
# 
# [Local Indicators of Spatial Association](https://en.wikipedia.org/wiki/Indicators_of_spatial_association) (LISAs, [Anselin, 1995](http://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1995.tb00338.x/abstract)) are a common tool to explore spatial heterogeneity, identify spatial concentration of (dis)similar values and test for the probability such agglomerations originate from a random process.
# 
# [`PySAL`](http://pysal.org) has had functionality to run LISAs since its very beginning. However, the output of such computation is not very useful just by itself as, being a local statistic, LISAs imply running a test for every single observation. The usual approach has been to visualize significant clusters on a map using a simple color coding for each class (High-High, HH; Low-Low, LL; High-Low, HL; Low-High, LH). Such visualiations have been available for a long time in packages such as [GeoDa](https://geodacenter.asu.edu/projects/opengeoda), but this would imply breaking the workflow to switch between Python and GeoDa, plus with the annoyance that the workflow cannot thus be automated, a convenient feature for [reproducible science](https://en.wikipedia.org/wiki/Reproducibility).
# 
# In this notebook, we show how to leverage the visualization layer that is being built in `PySAL` to create generic and custom LISA cluster maps.

# ## Set up 

# For this example, we will use the `NAT` dataset, included in `PySAL` by default, exploring the variable `HR90`, homicide rates in 1990 at the county level. 
# 
# Let's start by importing the required code. This is all code that will be made available in the upcoming release of `PySAL` in July 2015 (if you're reading this afterwards, it's all in there by default!).

# In[5]:


get_ipython().run_line_magic('matplotlib', 'inline')

import pysal as ps
import numpy as np
from pysal.contrib.viz import mapping as maps

shp_link = ps.examples.get_path('NAT.shp')
print 'Reading from ', shp_link

hr90 = np.array(ps.open(shp_link.replace('.shp', '.dbf')).by_col('HR90'))


# Before any further analysis, it is always good practice to visualize the distribution of values. We can now conveniently do that in `PySAL`:

# In[15]:


_ = maps.plot_choropleth(shp_link, hr90, 'quantiles', cmap='Greens', figsize=(9, 6))


# Before we can plot a map, we need to run a LISA. In the spirit of flexibility and modularity envisioned in `PySAL`, the two operations (computation, visualization) are decoupled. We will use a simple contiguity matrix for this.

# In[45]:


w = ps.queen_from_shapefile(shp_link)
lisa = ps.Moran_Local(hr90, w, permutations=9999)


# ## LISA maps in one line of code

# The simplest way to obtain a quick visualization of LISA results is with the "one-liner" `plot_lisa_cluster`:

# In[46]:


_ = maps.plot_lisa_cluster(shp_link, lisa, figsize=(9, 6))


# ## (Prettier) LISA maps using `cartopy`

# The previous map is a very quick, painless way to obtain a visualization of LISA results. This is incredibly useful for interactive work where one wants to quickly iterate to get a sense of the data. However, once the researcher settles on a particular visualiation, there's some "eye-candy" that can be applied that improves notably the final result.
# 
# In this section, we use the fantastic library [`cartopy`](http://scitools.org.uk/cartopy/) to produce a publication-ready LISA plot. This, in part, uses some of the internal machinery in `pysal.contrib.viz`, combined with plain `matplotlib` functionality.
# 
# **NOTE**: besides `PySAL`, you will need `cartopy` installed in your system to reproduce this example.

# In[24]:


import matplotlib.pyplot as plt
import cartopy.crs as ccrs


# In[47]:


orig_crs = ccrs.PlateCarree()
projection = ccrs.LambertConformal()
p_thres = 0.01

shp = ps.open(shp_link)
polys = maps.map_poly_shp(shp)
polys = maps.base_lisa_cluster(polys, lisa, p_thres=p_thres)
polys.set_edgecolor('1')
polys.set_linewidth(0.2)
polys.set_transform(orig_crs)

f = plt.figure(figsize=(12, 8))

ax = plt.axes(projection=projection)
extent = [shp.bbox[0], shp.bbox[2], shp.bbox[1], shp.bbox[3]]
ax.set_extent(extent, crs=ccrs.PlateCarree())
ax.add_collection(polys)
ax.outline_patch.set_visible(False)

boxes, labels = maps.lisa_legend_components(lisa, p_thres=p_thres)
plt.legend(boxes, labels, loc='lower left', fancybox=True)

plt.title('HR90 | LISA clusters | P-value = %.2f'%p_thres)

plt.show()


# Having this kind of maps produced in an automated way is handy because it is very straightforward then to plot a series of them. As an example, we will plot a different map for solutions that consider different significance thresholds.

# In[48]:


orig_crs = ccrs.PlateCarree()
projection = ccrs.LambertConformal()
p_thresS = [0.1, 0.05, 0.01, 0.001]

f = plt.figure(figsize=(16, 10))

for i, p_thres in enumerate(p_thresS):
    shp = ps.open(shp_link)
    polys = maps.map_poly_shp(shp)
    polys = maps.base_lisa_cluster(polys, lisa, p_thres=p_thres)
    polys.set_edgecolor('1')
    polys.set_linewidth(0.2)
    polys.set_transform(orig_crs)

    ax = plt.subplot(2, 2, i+1, projection=projection)
    extent = [shp.bbox[0], shp.bbox[2], shp.bbox[1], shp.bbox[3]]
    ax.set_extent(extent, crs=ccrs.PlateCarree())
    ax.add_collection(polys)
    ax.outline_patch.set_visible(False)

    boxes, labels = maps.lisa_legend_components(lisa, p_thres=p_thres)
    plt.legend(boxes, labels, loc='lower left', frameon=False)
    ax.set_title('P-value = %.3f'%p_thres)

plt.show()