This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.

7.6. Estimating a probability distribution nonparametrically with a Kernel Density Estimation

You need to download the Storms dataset on the book's website, and extract it in the current directory. (

You also need matplotlib's toolkit basemap. (

  1. Let's import the usual packages. The kernel density estimation with a Gaussian kernel is implemented in SciPy.stats.
In [ ]:
import numpy as np
import pandas as pd
import scipy.stats as st
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
%matplotlib inline
  1. Let's open the data with Pandas.
In [ ]:
df = pd.read_csv("data/Allstorms.ibtracs_wmo.v03r05.csv")
  1. The dataset contains information about most storms since 1848. A single storm may appear multiple times across several consecutive days.
In [ ]:
  1. We use Pandas' groupby function to obtain the average location of every storm.
In [ ]:
dfs = df.groupby('Serial_Num')
pos = dfs[['Latitude', 'Longitude']].mean()
y, x = pos.values.T
  1. We display the storms on a map with basemap. This toolkit allows us to easily project the geographical coordinates on the map.
In [ ]:
m = Basemap(projection='mill', llcrnrlat=-65 ,urcrnrlat=85,
            llcrnrlon=-180, urcrnrlon=180)
x0, y0 = m(-180, -65)
x1, y1 = m(180, 85)
xm, ym = m(x, y)
m.plot(xm, ym, '.r', alpha=.1);
  1. To perform the Kernel Density Estimation, we need to stack the x and y coordinates of the storms into a 2xN array.
In [ ]:
h = np.vstack((xm, ym))
In [ ]:
kde = st.gaussian_kde(h)
  1. The gaussian_kde routine returned a Python function. To see the results on a map, we need to evaluate this function on a 2D grid spanning the entire map. We create this grid with meshgrid, and we pass the x, y values to the kde function. We need to arrange the shape of the array since kde accepts a 2xN array as input.
In [ ]:
k = 50
tx, ty = np.meshgrid(np.linspace(x0, x1, 2*k),
                     np.linspace(y0, y1, k))
v = kde(np.vstack((tx.ravel(), ty.ravel()))).reshape((k, 2*k))
  1. Finally, we display the estimated density with imshow.
In [ ]:
xm, ym = m(x, y)
m.imshow(v, origin='lower', extent=[x0,x1,y0,y1],

You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).

IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).