# Python Is Not C: Take Two¶

## Jean-François Puget¶

An exercise in optimising Python code.

This notebook is commented in Python Is Not C: Take Two

First, let's import useful tools and packages

In [1]:
%load_ext Cython
from numba import jit,autojit

import random
import numpy as np
import math
import pandas as pd

import sys

if sys.version_info < (3,):
range = xrange

from matplotlib import pyplot as plt
%matplotlib inline


## Data sets¶

We will use two data sets, a random one, and one taken from RAAM. Each data set has a set of test points and a set of track points. The problem is to compute the closest point among the track points for each point in the test points.

### Random¶

Track and waypoints are uniformly drawn in the North East part of the world: 0<= Latitude <= 90 and 0 <= Longitude <= 180. It has 25,000 track points and 400 waypoints.

In [2]:
def create_points(n):
return pd.DataFrame({'Latitude': np.random.uniform(0.0,90.0,n), \
'Longitude': np.random.uniform(0.0,180.0,n)})

track_points = create_points(25000)
test_points = create_points(400)


### Raam¶

Track and waypoints from the 2015 RAAM competition. It has 25,092 track points and 428 waypoints.

In [3]:
def read_points(file):
points = points[['Lat','Lon']]
points.columns = ['Latitude','Longitude']
return points

points2 = np.zeros((len(track_points2),2))
points2[:,0] = track_points2['Latitude']
points2[:,1] = track_points2['Longitude']



Let's display these data sets. We display the test_points

In [4]:
datasets = [test_points, test_points2]
titles = ['random', 'raam']

fig, ax = plt.subplots(1, 2, figsize=(8, 3.5))

for axi, title, dataset in zip(ax, titles, datasets):
axi.plot(dataset['Longitude'], dataset['Latitude'], '.k')
axi.set_title(title, size=14)
axi.set_xlabel('Longitude')
axi.set_ylabel('Latitude')

plt.savefig('nneighbors.png')


## Reference Code In Python¶

This is the code I wrote almost a year ago, and that is described in Python Is Not C

We use the following function is from John D Cook. It implements the great circle distance

In [5]:
def distance_on_unit_sphere(lat1, long1, lat2, long2):

# Convert latitude and longitude to

# phi = 90 - latitude

# theta = longitude

# Compute spherical distance from spherical coordinates.

# For two locations in spherical coordinates
# (1, theta, phi) and (1, theta', phi')
# cosine( arc length ) =
#    sin phi sin phi' cos(theta-theta') + cos phi cos phi'
# distance = rho * arc length

cos = (math.sin(phi1)*math.sin(phi2)*math.cos(theta1 - theta2) +
math.cos(phi1)*math.cos(phi2))
arc = math.acos( cos )

# Remember to multiply arc by the radius of the earth
# in your favorite set of units to get length.
return arc


We use it to find the closest point in a list of points.

In [6]:
def closest_distance(latitude,longitude,points):
d = 100000.0
best = -1
r = points.index
for i in r:
latitude_i = points.ix[i,'Latitude']
longitude_i = points.ix[i,'Longitude']
md = distance_on_unit_sphere(latitude, longitude, latitude_i, longitude_i)
if d > md:
best = i
d = md
return best,d


## Profiling¶

Let's run the above code for each test point

In [9]:
def test(track_points,test_points):
test_points['Closest_point'] = 0
test_points['Closest_distance'] = 0.0
for i in range(len(test_points)):
latitude_i = test_points.ix[i,'Latitude']
longitude_i = test_points.ix[i,'Longitude']
best, distance = closest_distance(latitude_i, longitude_i, track_points)
test_points.ix[i,'Closest_point'] = best
test_points.ix[i,'Closest_distance'] = distance
return test_points

In [10]:
%timeit test(track_points,test_points)
%timeit test(track_points2,test_points2)

1 loops, best of 3: 3min 44s per loop
1 loops, best of 3: 3min 42s per loop


Where is this pending time? Let's see with a smaller data set.

In [10]:
%lprun -s -f closest_distance -f test test(create_points(2500),create_points(40))


We see most time is spent on .ix and distance_on_unit_sphere

## Series As Numpy Arrays¶

It is an advice found on pandas documentation to move to Numpy arrays to speed up random access to elements. Let's do it here.

In [11]:
def closest_distance_numpy(latitude,longitude,m,points_latitude,points_longitude):
d = 100000.0
best = -1
for i in range(m):
latitude_i = points_latitude[i]
longitude_i = points_longitude[i]
md = distance_on_unit_sphere(latitude, longitude, latitude_i, longitude_i)
if d > md:
best = i
d = md
return best,d

def test_numpy(track_points,test_points):
track_latitudes = track_points['Latitude'].values
track_longitudes = track_points['Longitude'].values

test_latitudes = test_points['Latitude'].values
test_longitudes = test_points['Longitude'].values

n = len(test_points)
test_closest_point = np.ndarray(n,dtype=int)
test_closest_distance = np.ndarray(n,dtype=float)

m = len(track_points)
for i in range(n):
latitude_i = test_latitudes[i]
longitude_i = test_latitudes[i]
best, distance = closest_distance_numpy(latitude_i, longitude_i, \
m, track_latitudes,track_longitudes)
test_closest_point[i] = best
test_closest_distance[i] = distance

test_points['Closest_point'] = test_closest_point
test_points['Closest_distance'] = test_closest_distance
return test_points


How does it performs?

In [12]:
%timeit test_numpy(track_points,test_points)
%timeit test_numpy(track_points2,test_points2)

1 loops, best of 3: 33.9 s per loop
1 loops, best of 3: 34.7 s per loop


Much better.

Profiling it shows that the bulk of the time is in the distance_on_unit_sphere function.

In [13]:
%lprun -f closest_distance_numpy -f test_numpy test_numpy(track_points,test_points)


## Compiling With Numba¶

Let's see if compiling the distance function speeds up the code.

In [14]:
@jit
def distance_on_unit_sphere_numba(lat1, long1, lat2, long2):

# Convert latitude and longitude to

# phi = 90 - latitude

# theta = longitude

# Compute spherical distance from spherical coordinates.

# For two locations in spherical coordinates
# (1, theta, phi) and (1, theta', phi')
# cosine( arc length ) =
#    sin phi sin phi' cos(theta-theta') + cos phi cos phi'
# distance = rho * arc length

cos = (math.sin(phi1)*math.sin(phi2)*math.cos(theta1 - theta2) +
math.cos(phi1)*math.cos(phi2))
arc = math.acos( cos )

# Remember to multiply arc by the radius of the earth
# in your favorite set of units to get length.
return arc

@jit
def closest_distance_numba(latitude,longitude,m,points_latitude,points_longitude):
d = 100000.0
best = -1
for i in range(m):
latitude_i = points_latitude[i]
longitude_i = points_longitude[i]
md = distance_on_unit_sphere_numba(latitude, longitude, latitude_i, longitude_i)
if d > md:
best = i
d = md
return best,d

def test_numba(track_points,test_points):
track_latitudes = track_points['Latitude'].values
track_longitudes = track_points['Longitude'].values

test_latitudes = test_points['Latitude'].values
test_longitudes = test_points['Longitude'].values

n = len(test_points)
test_closest_point = np.ndarray(n,dtype=int)
test_closest_distance = np.ndarray(n,dtype=float)

m = len(track_points)
for i in range(n):
latitude_i = test_latitudes[i]
longitude_i = test_latitudes[i]
best, distance = closest_distance_numba(latitude_i, longitude_i, m, track_latitudes,track_longitudes)
test_closest_point[i] = best
test_closest_distance[i] = distance

test_points['Closest_point'] = test_closest_point
test_points['Closest_distance'] = test_closest_distance
return test_points

In [15]:
%timeit test_numba(track_points,test_points)
%timeit test_numba(track_points2,test_points2)

1 loops, best of 3: 943 ms per loop
1 loops, best of 3: 1.23 s per loop


Let's see if moving to Numpy functions is better

In [16]:
@jit
def distance_on_unit_sphere_numba(lat1, long1, lat2, long2):

# Convert latitude and longitude to

# phi = 90 - latitude

# theta = longitude

# Compute spherical distance from spherical coordinates.

# For two locations in spherical coordinates
# (1, theta, phi) and (1, theta', phi')
# cosine( arc length ) =
#    sin phi sin phi' cos(theta-theta') + cos phi cos phi'
# distance = rho * arc length

cos = (np.sin(phi1)*np.sin(phi2)*np.cos(theta1 - theta2) +
np.cos(phi1)*np.cos(phi2))
arc = np.acos( cos )

# Remember to multiply arc by the radius of the earth
# in your favorite set of units to get length.
return arc

In [17]:
%timeit test_numba(track_points,test_points)
%timeit test_numba(track_points2,test_points2)

1 loops, best of 3: 937 ms per loop
1 loops, best of 3: 1.23 s per loop


No change in timing. Let's profile our top function.

In [18]:
%lprun -s -f test_numba test_numba(track_points,test_points)


99.7 % of the time is spent in the closest_distance_numba function

## Cython¶

In [68]:
%%cython
import pandas as pd
import numpy as np
import math

cimport cython

cdef double distance_on_unit_sphere_cython(double lat1, double long1, double lat2, double long2):
cdef double degrees_to_radians, phi1, phi2, theta1, theta2, cos_value, arc_value, pi

# Convert latitude and longitude to
pi = math.pi

# phi = 90 - latitude

# theta = longitude

# Compute spherical distance from spherical coordinates.

# For two locations in spherical coordinates
# (1, theta, phi) and (1, theta', phi')
# cosine( arc length ) =
#    sin phi sin phi' cos(theta-theta') + cos phi cos phi'
# distance = rho * arc length

cos_value = (np.sin(phi1)*math.sin(phi2)*math.cos(theta1 - theta2) +
math.cos(phi1)*math.cos(phi2))
arc_value = math.acos( cos_value )

# Remember to multiply arc by the radius of the earth
# in your favorite set of units to get length.
return arc_value

@cython.boundscheck(False)
@cython.wraparound(False)
cdef closest_distance_cython( double latitude, double longitude, long m,\
double[:] points_latitude,\
double[:] points_longitude):
cdef:
double d, md, latitude_i, longitude_i
long bext, i

d = 100000.0
best = -1
for i in range(m):
latitude_i = points_latitude[i]
longitude_i = points_longitude[i]
md = distance_on_unit_sphere_cython(latitude, longitude, latitude_i, longitude_i)
if d > md:
best = i
d = md
return best,d

def test_cython(track_points,test_points):
track_latitudes = track_points['Latitude'].values
track_longitudes = track_points['Longitude'].values

test_latitudes = test_points['Latitude'].values
test_longitudes = test_points['Longitude'].values

n = len(test_points)
test_closest_point = np.ndarray(n,dtype=int)
test_closest_distance = np.ndarray(n,dtype=float)

m = len(track_points)
for i in range(n):
latitude_i = test_latitudes[i]
longitude_i = test_latitudes[i]
best, distance = closest_distance_cython(latitude_i, longitude_i, m, track_latitudes,track_longitudes)
test_closest_point[i] = best
test_closest_distance[i] = distance

test_points['Closest_point'] = test_closest_point
test_points['Closest_distance'] = test_closest_distance
return test_points


Cython isn't nearly as fast as Numba here because the math functions aren't inlined.

In [20]:
%timeit test_cython(track_points,test_points)
%timeit test_cython(track_points2,test_points2)

1 loops, best of 3: 32.3 s per loop
1 loops, best of 3: 34 s per loop


## Using vectorized operations¶

The best function I could find 8 months ago was using Numpy boradcast operations to avoid explicitly looping over the test points arrays. It is documented in Python Is Not C. It is reproduced below for comparison purpose.

In [23]:
#does not help: @jit
def closest_distance_pandas (lat1, long1, points):
cos = (np.sin(phi1)*np.sin(phi2)*np.cos(theta1 - theta2) +
np.cos(phi1)*np.cos(phi2))
arc = np.arccos( cos )
return arc.idxmin(), arc.min()

#does not help: @jit
def test_pandas(track_points,test_points):
test_points['Closest_point'] = 0
test_points['Closest_distance'] = 0.0
for i in range(len(test_points)):
latitude_i = test_points.ix[i,'Latitude']
longitude_i = test_points.ix[i,'Longitude']
best, distance = closest_distance_pandas(latitude_i, longitude_i, track_points)
test_points.ix[i,'Closest_point'] = best
test_points.ix[i,'Closest_distance'] = distance
return test_points

In [24]:
%timeit test_pandas(track_points,test_points)
%timeit test_pandas(track_points2,test_points2)

1 loops, best of 3: 2.17 s per loop
1 loops, best of 3: 2.3 s per loop


Let's profile it.

In [25]:
%lprun -f test_pandas test_pandas(track_points,test_points)


The bulk of the time is spent in closest_distance_pandas but ix also shows. As before, let's get rid of ix calls by switching to Numpy series.

## Using Numpy¶

In [26]:
def closest_distance_broadcast (lat1, long1, latitudes, longitudes):
cos = (np.sin(phi1)*np.sin(phi2)*np.cos(theta1 - theta2) +
np.cos(phi1)*np.cos(phi2))
arc = np.arccos( cos )
return arc.argmin(), arc.min()

track_latitudes = track_points['Latitude'].values
track_longitudes = track_points['Longitude'].values

test_latitudes = test_points['Latitude'].values
test_longitudes = test_points['Longitude'].values

n = len(test_points)
test_closest_point = np.ndarray(n,dtype=int)
test_closest_distance = np.ndarray(n,dtype=float)

for i in range(n):
latitude_i = test_latitudes[i]
longitude_i = test_longitudes[i]
best, distance = closest_distance_broadcast(latitude_i, longitude_i, track_latitudes, track_longitudes)
test_closest_point[i] = best
test_closest_distance[i] = distance

test_points['Closest_point'] = test_closest_point
test_points['Closest_distance'] = test_closest_distance
return test_points

In [27]:
%timeit test_broadcast(track_points,test_points)

1 loops, best of 3: 797 ms per loop
1 loops, best of 3: 806 ms per loop


Almost 3 times faster!

Let's profile it.

In [32]:
%lprun -s -f test_broadcast -f closest_distance_broadcast test_broadcast(track_points,test_points)


All the time is spent in broadcast math functions.

In [33]:
@jit
def closest_distance_broadcast_numba (lat1, long1, latitudes, longitudes):
# Convert latitude and longitude to

# phi = 90 - latitude

# theta = longitude

# Compute spherical distance from spherical coordinates.

# For two locations in spherical coordinates
# (1, theta, phi) and (1, theta, phi)
# cosine( arc length ) =
#    sin phi sin phi' cos(theta-theta') + cos phi cos phi'
# distance = rho * arc length

cos = (math.sin(phi1)*np.sin(phi2)*np.cos(theta1 - theta2) +
math.cos(phi1)*np.cos(phi2))
arc = np.arccos( cos )

# Return the index of the smallest value in arc
return arc.argmin(), arc.min()

track_latitudes = track_points['Latitude'].values
track_longitudes = track_points['Longitude'].values

test_latitudes = test_points['Latitude'].values
test_longitudes = test_points['Longitude'].values

n = len(test_points)
test_closest_point = np.ndarray(n,dtype=int)
test_closest_distance = np.ndarray(n,dtype=float)

for i in range(n):
latitude_i = test_latitudes[i]
longitude_i = test_longitudes[i]
best, distance = closest_distance_broadcast_numba(latitude_i, longitude_i, track_latitudes, track_longitudes)
test_closest_point[i] = best
test_closest_distance[i] = distance

test_points['Closest_point'] = test_closest_point
test_points['Closest_distance'] = test_closest_distance
return test_points

In [34]:
%timeit test_broadcast_numba(track_points,test_points)

1 loops, best of 3: 614 ms per loop
1 loops, best of 3: 678 ms per loop


## Cython¶

Compiling with cython instead of Numba.

In [48]:
%%cython
import pandas as pd
import numpy as np
import math

cimport cython
cimport numpy as np

def closest_distance_broadcast_cython (double lat1, double long1, \
np.ndarray[double,ndim=1] latitudes, \
np.ndarray[double,ndim=1] longitudes):
# Convert latitude and longitude to

cdef:
np.ndarray[double, ndim=1] phi2, theta2, cos, arc

# phi = 90 - latitude

# theta = longitude

# Compute spherical distance from spherical coordinates.

# For two locations in spherical coordinates
# (1, theta, phi) and (1, theta, phi)
# cosine( arc length ) =
#    sin phi sin phi' cos(theta-theta') + cos phi cos phi'
# distance = rho * arc length

cos = (np.sin(phi1)*np.sin(phi2)*np.cos(theta1 - theta2) +
np.cos(phi1)*np.cos(phi2))
arc = np.arccos( cos )

# Return the index of the smallest value in arc
return arc.argmin(), arc.min()

track_latitudes = track_points['Latitude'].values
track_longitudes = track_points['Longitude'].values

test_latitudes = test_points['Latitude'].values
test_longitudes = test_points['Longitude'].values

n = len(test_points)
test_closest_point = np.ndarray(n,dtype=int)
test_closest_distance = np.ndarray(n,dtype=float)

for i in range(n):
latitude_i = test_latitudes[i]
longitude_i = test_longitudes[i]
best, distance = closest_distance_broadcast_cython(latitude_i, longitude_i, track_latitudes, track_longitudes)
test_closest_point[i] = best
test_closest_distance[i] = distance

test_points['Closest_point'] = test_closest_point
test_points['Closest_distance'] = test_closest_distance
return test_points


Cython does not speed the code.

In [49]:
%timeit test_broadcast_cython(track_points, test_points)

1 loops, best of 3: 832 ms per loop
1 loops, best of 3: 822 ms per loop


## Manhattan Distance¶

In Python Is Not C, I decided to use amuch simpler distance, namely the Manhattan distance. The code becomes

In [28]:
def closest_manhattan(lat,lon,trkpts):
cl = np.abs(trkpts.Latitude - lat) + np.abs(trkpts.Longitude - lon)
return cl.idxmin(), cl.min()

def test_manhattan(track_points,test_points):
test_points['Closest_point'] = 0
test_points['Closest_distance'] = 0.0
for i in range(len(test_points)):
latitude_i = test_points.ix[i,'Latitude']
longitude_i = test_points.ix[i,'Longitude']
best, distance = closest_manhattan(latitude_i, longitude_i, track_points)
test_points.ix[i,'Closest_point'] = best
test_points.ix[i,'Closest_distance'] = distance
return test_points

In [29]:
%timeit test_manhattan(track_points,test_points)
%timeit test_manhattan(track_points2,test_points2)

1 loops, best of 3: 809 ms per loop
1 loops, best of 3: 884 ms per loop


This is alsmost as fast as the Numba compiled code above. Let's profile it.

In [52]:
%lprun -f test_manhattan test_manhattan(track_points,test_points)


About 40% of the time is spent in ix. As before, let's move to Numpy arrays.

In [53]:
def closest_manhattan_numpy(lat,lon,latitudes, longitudes):
cl = np.abs(latitudes - lat) + np.abs(longitudes - lon)
return cl.argmin(), cl.min()

def test_manhattan_numpy(track_points,test_points):
track_latitudes = track_points['Latitude'].values
track_longitudes = track_points['Longitude'].values

test_latitudes = test_points['Latitude'].values
test_longitudes = test_points['Longitude'].values

n = len(test_points)
test_closest_point = np.ndarray(n,dtype=int)
test_closest_distance = np.ndarray(n,dtype=float)

for i in range(n):
latitude_i = test_latitudes[i]
longitude_i = test_longitudes[i]
best, distance = closest_manhattan_numpy(latitude_i, longitude_i, track_latitudes, track_longitudes)
test_closest_point[i] = best
test_closest_distance[i] = distance

test_points['Closest_point'] = test_closest_point
test_points['Closest_distance'] = test_closest_distance
return test_points


This is way faster.

In [54]:
%timeit test_manhattan_numpy(track_points,test_points)
%timeit test_manhattan_numpy(track_points2,test_points2)

10 loops, best of 3: 52.2 ms per loop
10 loops, best of 3: 76.7 ms per loop


Profiling it confirms that most of the time is now spent in closest_manhattan_numpy

In [57]:
%lprun -f test_manhattan_numpy test_manhattan_numpy(track_points,test_points)


Let's compile it with Numba.

In [58]:
@jit
def closest_manhattan_numba(latitude,longitude,m,points_latitude,points_longitude):
d = 100000.0
best = -1
for i in range(m):
latitude_i = points_latitude[i]
longitude_i = points_longitude[i]
md = abs(latitude - latitude_i) + abs(longitude - longitude_i)
if d > md:
best = i
d = md
return best,d

def test_manhattan_numba(track_points,test_points):
track_latitudes = track_points['Latitude'].values
track_longitudes = track_points['Longitude'].values

test_latitudes = test_points['Latitude'].values
test_longitudes = test_points['Longitude'].values

n = len(test_points)
test_closest_point = np.ndarray(n,dtype=int)
test_closest_distance = np.ndarray(n,dtype=float)

m = len(track_points)
for i in range(n):
latitude_i = test_latitudes[i]
longitude_i = test_longitudes[i]
best, distance = closest_manhattan_numba(latitude_i, longitude_i, m, track_latitudes, track_longitudes)
test_closest_point[i] = best
test_closest_distance[i] = distance

test_points['Closest_point'] = test_closest_point
test_points['Closest_distance'] = test_closest_distance
return test_points


This yields another 2x speedup.

In [60]:
%timeit test_manhattan_numba(track_points,test_points)
%timeit test_manhattan_numba(track_points2,test_points2)

10 loops, best of 3: 30.4 ms per loop
10 loops, best of 3: 35.7 ms per loop


We can try Numpy broadcasting with this simpler distance function.

In [61]:
def test_manhattan_broadcast(track_points,test_points):
track_latitudes = track_points['Latitude'].values
track_longitudes = track_points['Longitude'].values

test_latitudes = test_points['Latitude'].values
test_longitudes = test_points['Longitude'].values

n = len(test_points)
test_closest_point = np.ndarray(n,dtype=int)
test_closest_distance = np.ndarray(n,dtype=float)

m = len(track_points)

t1 = test_latitudes[:,None]-track_latitudes
t2 = test_longitudes[:,None]-track_longitudes
distances = np.abs(t1) + np.abs(t2)

test_points['Closest_point'] = distances.argmin(axis=1)
test_points['Closest_distance'] = distances.min(axis=1)
return test_points

In [62]:
%timeit test_manhattan_broadcast(track_points,test_points)

1 loops, best of 3: 196 ms per loop
1 loops, best of 3: 229 ms per loop


## Using Scipy and SKlearn¶

Let's now use built-in packages for finding nearest neighbors.

In [5]:
import scipy.spatial
import sklearn.neighbors


We need to copy track points coordinates in a 2D array.

In [6]:
points = np.zeros((len(track_points),2))
points[:,0] = track_points['Latitude']
points[:,1] = track_points['Longitude']


We can now benchmark four built-in functions. These functions start by building a tree with the track points. Let's measure the time it takes as this step is not required with the code above.

In [7]:
%timeit scipy.spatial.cKDTree(points)
%timeit scipy.spatial.KDTree(points)
%timeit sklearn.neighbors.KDTree(points,metric='minkowski',p=1)
%timeit sklearn.neighbors.BallTree(points,metric='minkowski',p=1)

100 loops, best of 3: 10.9 ms per loop
10 loops, best of 3: 134 ms per loop
100 loops, best of 3: 11 ms per loop
100 loops, best of 3: 10.4 ms per loop


Let's create the trees for the rest of the benchmark.

In [8]:
t1 = scipy.spatial.cKDTree(points)
t2 = scipy.spatial.KDTree(points)
t3 = sklearn.neighbors.KDTree(points,metric='minkowski',p=1)
t4 = sklearn.neighbors.BallTree(points,metric='minkowski',p=1)


A utility function for the benchmark. It is good that all these trees have a similar api.

In [20]:
def test_tree(tree, test_points):

tests = np.zeros((len(test_points),2))
tests[:,0] = test_points['Latitude']
tests[:,1] = test_points['Longitude']

distances, closests = tree.query(tests)

test_points['Closest_point'] = closests
test_points['Closest_distance'] = distances
return test_points

In [21]:
tests = np.zeros((len(test_points),2))
tests[:,0] = test_points['Latitude']
tests[:,1] = test_points['Longitude']

In [23]:
%timeit test_tree(t1, test_points)
%timeit test_tree(t2, test_points)
%timeit test_tree(t3, test_points)
%timeit test_tree(t4, test_points)

1000 loops, best of 3: 800 µs per loop
10 loops, best of 3: 83.7 ms per loop
1000 loops, best of 3: 1.11 ms per loop
1000 loops, best of 3: 1.91 ms per loop


Let's test with the Raam data set

In [24]:
t1 = scipy.spatial.cKDTree(points2)
t2 = scipy.spatial.KDTree(points2)
t3 = sklearn.neighbors.KDTree(points2,metric='minkowski',p=1)
t4 = sklearn.neighbors.BallTree(points2,metric='minkowski',p=1)


Timing tree creation

In [25]:
%timeit scipy.spatial.cKDTree(points2)
%timeit scipy.spatial.KDTree(points2)
%timeit sklearn.neighbors.KDTree(points2,metric='minkowski',p=1)
%timeit sklearn.neighbors.BallTree(points2,metric='minkowski',p=1)

1 loops, best of 3: 802 ms per loop
10 loops, best of 3: 205 ms per loop
1 loops, best of 3: 967 ms per loop
1 loops, best of 3: 971 ms per loop


Timing closest distance

In [26]:
%timeit test_tree(t1, test_points2)
%timeit test_tree(t2, test_points2)
%timeit test_tree(t3, test_points2)
%timeit test_tree(t4, test_points2)

The slowest run took 91.35 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 759 µs per loop
10 loops, best of 3: 111 ms per loop
1000 loops, best of 3: 1.02 ms per loop
1000 loops, best of 3: 1.29 ms per loop

In [ ]: