#!/usr/bin/env python # coding: utf-8 # # LISA cluster maps with `PySAL` # * Dani Arribas-Bel ([@darribas](http://twitter.com/darribas)) # # This notebook is available as a [gist](https://gist.github.com/darribas/657e0568df7a63362762), or viewable online as [static html](http://nbviewer.ipython.org/urls/gist.githubusercontent.com/darribas/657e0568df7a63362762/raw/pysal_lisa_maps.ipynb). # # ---- # # [Local Indicators of Spatial Association](https://en.wikipedia.org/wiki/Indicators_of_spatial_association) (LISAs, [Anselin, 1995](http://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1995.tb00338.x/abstract)) are a common tool to explore spatial heterogeneity, identify spatial concentration of (dis)similar values and test for the probability such agglomerations originate from a random process. # # [`PySAL`](http://pysal.org) has had functionality to run LISAs since its very beginning. However, the output of such computation is not very useful just by itself as, being a local statistic, LISAs imply running a test for every single observation. The usual approach has been to visualize significant clusters on a map using a simple color coding for each class (High-High, HH; Low-Low, LL; High-Low, HL; Low-High, LH). Such visualiations have been available for a long time in packages such as [GeoDa](https://geodacenter.asu.edu/projects/opengeoda), but this would imply breaking the workflow to switch between Python and GeoDa, plus with the annoyance that the workflow cannot thus be automated, a convenient feature for [reproducible science](https://en.wikipedia.org/wiki/Reproducibility). # # In this notebook, we show how to leverage the visualization layer that is being built in `PySAL` to create generic and custom LISA cluster maps. # ## Set up # For this example, we will use the `NAT` dataset, included in `PySAL` by default, exploring the variable `HR90`, homicide rates in 1990 at the county level. # # Let's start by importing the required code. This is all code that will be made available in the upcoming release of `PySAL` in July 2015 (if you're reading this afterwards, it's all in there by default!). # In[5]: get_ipython().run_line_magic('matplotlib', 'inline') import pysal as ps import numpy as np from pysal.contrib.viz import mapping as maps shp_link = ps.examples.get_path('NAT.shp') print 'Reading from ', shp_link hr90 = np.array(ps.open(shp_link.replace('.shp', '.dbf')).by_col('HR90')) # Before any further analysis, it is always good practice to visualize the distribution of values. We can now conveniently do that in `PySAL`: # In[15]: _ = maps.plot_choropleth(shp_link, hr90, 'quantiles', cmap='Greens', figsize=(9, 6)) # Before we can plot a map, we need to run a LISA. In the spirit of flexibility and modularity envisioned in `PySAL`, the two operations (computation, visualization) are decoupled. We will use a simple contiguity matrix for this. # In[45]: w = ps.queen_from_shapefile(shp_link) lisa = ps.Moran_Local(hr90, w, permutations=9999) # ## LISA maps in one line of code # The simplest way to obtain a quick visualization of LISA results is with the "one-liner" `plot_lisa_cluster`: # In[46]: _ = maps.plot_lisa_cluster(shp_link, lisa, figsize=(9, 6)) # ## (Prettier) LISA maps using `cartopy` # The previous map is a very quick, painless way to obtain a visualization of LISA results. This is incredibly useful for interactive work where one wants to quickly iterate to get a sense of the data. However, once the researcher settles on a particular visualiation, there's some "eye-candy" that can be applied that improves notably the final result. # # In this section, we use the fantastic library [`cartopy`](http://scitools.org.uk/cartopy/) to produce a publication-ready LISA plot. This, in part, uses some of the internal machinery in `pysal.contrib.viz`, combined with plain `matplotlib` functionality. # # **NOTE**: besides `PySAL`, you will need `cartopy` installed in your system to reproduce this example. # In[24]: import matplotlib.pyplot as plt import cartopy.crs as ccrs # In[47]: orig_crs = ccrs.PlateCarree() projection = ccrs.LambertConformal() p_thres = 0.01 shp = ps.open(shp_link) polys = maps.map_poly_shp(shp) polys = maps.base_lisa_cluster(polys, lisa, p_thres=p_thres) polys.set_edgecolor('1') polys.set_linewidth(0.2) polys.set_transform(orig_crs) f = plt.figure(figsize=(12, 8)) ax = plt.axes(projection=projection) extent = [shp.bbox[0], shp.bbox[2], shp.bbox[1], shp.bbox[3]] ax.set_extent(extent, crs=ccrs.PlateCarree()) ax.add_collection(polys) ax.outline_patch.set_visible(False) boxes, labels = maps.lisa_legend_components(lisa, p_thres=p_thres) plt.legend(boxes, labels, loc='lower left', fancybox=True) plt.title('HR90 | LISA clusters | P-value = %.2f'%p_thres) plt.show() # Having this kind of maps produced in an automated way is handy because it is very straightforward then to plot a series of them. As an example, we will plot a different map for solutions that consider different significance thresholds. # In[48]: orig_crs = ccrs.PlateCarree() projection = ccrs.LambertConformal() p_thresS = [0.1, 0.05, 0.01, 0.001] f = plt.figure(figsize=(16, 10)) for i, p_thres in enumerate(p_thresS): shp = ps.open(shp_link) polys = maps.map_poly_shp(shp) polys = maps.base_lisa_cluster(polys, lisa, p_thres=p_thres) polys.set_edgecolor('1') polys.set_linewidth(0.2) polys.set_transform(orig_crs) ax = plt.subplot(2, 2, i+1, projection=projection) extent = [shp.bbox[0], shp.bbox[2], shp.bbox[1], shp.bbox[3]] ax.set_extent(extent, crs=ccrs.PlateCarree()) ax.add_collection(polys) ax.outline_patch.set_visible(False) boxes, labels = maps.lisa_legend_components(lisa, p_thres=p_thres) plt.legend(boxes, labels, loc='lower left', frameon=False) ax.set_title('P-value = %.3f'%p_thres) plt.show()