#!/usr/bin/env python # coding: utf-8 # > This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python. # # # 7.6. Estimating a probability distribution nonparametrically with a Kernel Density Estimation # You need to download the *Storms* dataset on the book's website, and extract it in the current directory. (http://ipython-books.github.io) # # You also need matplotlib's toolkit *basemap*. (http://matplotlib.org/basemap/) # 1. Let's import the usual packages. The kernel density estimation with a Gaussian kernel is implemented in *SciPy.stats*. # In[ ]: import numpy as np import pandas as pd import scipy.stats as st import matplotlib.pyplot as plt from mpl_toolkits.basemap import Basemap get_ipython().run_line_magic('matplotlib', 'inline') # 2. Let's open the data with Pandas. # In[ ]: # http://www.ncdc.noaa.gov/ibtracs/index.php?name=wmo-data df = pd.read_csv("data/Allstorms.ibtracs_wmo.v03r05.csv") # 3. The dataset contains information about most storms since 1848. A single storm may appear multiple times across several consecutive days. # In[ ]: df[df.columns[[0,1,3,8,9]]].head() # 4. We use Pandas' `groupby` function to obtain the average location of every storm. # In[ ]: dfs = df.groupby('Serial_Num') pos = dfs[['Latitude', 'Longitude']].mean() y, x = pos.values.T pos.head() # 5. We display the storms on a map with basemap. This toolkit allows us to easily project the geographical coordinates on the map. # In[ ]: m = Basemap(projection='mill', llcrnrlat=-65 ,urcrnrlat=85, llcrnrlon=-180, urcrnrlon=180) x0, y0 = m(-180, -65) x1, y1 = m(180, 85) plt.figure(figsize=(10,6)) m.drawcoastlines() m.fillcontinents(color='#dbc8b2') xm, ym = m(x, y) m.plot(xm, ym, '.r', alpha=.1); # 6. To perform the Kernel Density Estimation, we need to stack the x and y coordinates of the storms into a 2xN array. # In[ ]: h = np.vstack((xm, ym)) # In[ ]: kde = st.gaussian_kde(h) # 7. The `gaussian_kde` routine returned a Python function. To see the results on a map, we need to evaluate this function on a 2D grid spanning the entire map. We create this grid with `meshgrid`, and we pass the x, y values to the `kde` function. We need to arrange the shape of the array since `kde` accepts a 2xN array as input. # In[ ]: k = 50 tx, ty = np.meshgrid(np.linspace(x0, x1, 2*k), np.linspace(y0, y1, k)) v = kde(np.vstack((tx.ravel(), ty.ravel()))).reshape((k, 2*k)) # 8. Finally, we display the estimated density with `imshow`. # In[ ]: plt.figure(figsize=(10,6)) m.drawcoastlines() m.fillcontinents(color='#dbc8b2') xm, ym = m(x, y) m.imshow(v, origin='lower', extent=[x0,x1,y0,y1], cmap=plt.get_cmap('Reds')); # > You'll find all the explanations, figures, references, and much more in the book (to be released later this summer). # # > [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages).