Interaction with other libraries


  • It's a very romantic notion to think that we can come up with the best features to model our world. That notion has now been dispelled.
  • Most object detection/labeling/segmentation/classification tasks now have neural network equivalent algorithms that perform on-par with or better than hand-crafted methods.
  • One library that gives Python users particularly easy access to deep learning is Keras: (it works with both Theano and TensorFlow).
  • At SciPy2017: "Fully Convolutional Networks for Image Segmentation", Daniil Pakhomov, SciPy2017 (Friday 2:30pm)
    • Particularly interesting, because such networks can be applied to images of any size
    • ... and because Daniil is a scikit-image contributor ;)



E.g., see how to fine tune a model on top of InceptionV3:

  • In the Keras docs, you may read about image_data_format. By default, this is channels-last, which is compatible with scikit-image's storage of (row, cols, ch).
In [ ]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout

import matplotlib.pyplot as plt
%matplotlib inline

## Generate dummy data
#X_train = np.random.random((1000, 2))
#y_train = np.random.randint(2, size=(1000, 1))
#X_test = np.random.random((100, 2))
#y_test = np.random.randint(2, size=(100, 1))

## Generate dummy data with some structure

from sklearn import datasets
from sklearn.model_selection import train_test_split

X, y = datasets.make_classification(n_features=2, n_samples=2000, n_redundant=0, n_informative=1,
                                    n_clusters_per_class=1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

model = Sequential()
model.add(Dense(64, input_dim=2, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

              metrics=['accuracy']), y_train,
score = model.evaluate(X_test, y_test, batch_size=128)

print('\n\nAccuracy:', score[1]);
In [ ]:
from sklearn.ensemble import RandomForestClassifier
In [ ]:
rf = RandomForestClassifier(), y_train)
rf.score(X_test, y_test)
In [ ]:
f, (ax0, ax1, ax2) = plt.subplots(1, 3, figsize=(15, 5))

mask = (y_train == 0)
ax0.plot(X_train[mask, 0], X_train[mask, 1], 'b.')
ax0.plot(X_train[~mask, 0], X_train[~mask, 1], 'r.')
ax0.set_title('True Labels')

y_nn = model.predict_classes(X_test).flatten()
mask = (y_nn == 0)
ax1.plot(X_test[mask, 0], X_test[mask, 1], 'b.')
ax1.plot(X_test[~mask, 0], X_test[~mask, 1], 'r.')
ax1.set_title('Labels by neural net')

y_rf = rf.predict(X_test)
mask = (y_rf == 0)
ax2.plot(X_test[mask, 0], X_test[mask, 1], 'b.')
ax2.plot(X_test[~mask, 0], X_test[~mask, 1], 'r.');
ax2.set_title('Labels by random forest')
In [ ]:
from keras.applications.inception_v3 import InceptionV3, preprocess_input, decode_predictions
net = InceptionV3()
In [ ]:
from skimage import transform

def inception_predict(image):
    # Rescale image to 299x299, as required by InceptionV3
    image_prep = transform.resize(image, (299, 299, 3), mode='reflect')
    # Scale image values to [-1, 1], as required by InceptionV3
    image_prep = (img_as_float(image_prep) - 0.5) * 2
    predictions = decode_predictions(
        net.predict(image_prep[None, ...])
    plt.imshow(image, cmap='gray')
    for pred in predictions[0]:
        (n, klass, prob) = pred
        print(f'{klass:>15} ({prob:.3f})')
In [ ]:
from skimage import data, img_as_float
In [ ]:
In [ ]:

You can fine-tune Inception to classify your own classes, as described at

In [ ]:
import numpy as np
image = np.random.random((512, 512))

footprint = np.array([[0, 1, 0],
                      [1, 1, 1],
                      [0, 1, 0]], dtype=bool)
In [ ]:
from scipy import ndimage as ndi
%timeit ndi.grey_erosion(image, footprint=footprint)
In [ ]:
%timeit ndi.generic_filter(image, np.min, footprint=footprint)
In [ ]:
f'Slowdown is {825 / 2.85} times'
In [ ]:
%load_ext Cython
In [ ]:
%%cython --name=test9

from libc.stdint cimport intptr_t
from numpy.math cimport INFINITY

cdef api int erosion_kernel(double* input_arr_1d, intptr_t filter_size,
                            double* return_value, void* user_data):
        double[:] input_arr
        ssize_t i
    return_value[0] = INFINITY
    for i in range(filter_size):
        if input_arr_1d[i] < return_value[0]:
            return_value[0] = input_arr_1d[i]
    return 1
In [ ]:
from scipy import LowLevelCallable, ndimage
import sys

def erosion_fast(image, footprint):
    out = ndimage.generic_filter(
            LowLevelCallable.from_cython(sys.modules['test9'], name='erosion_kernel'),
    return out
In [ ]:
        erosion_fast(image, footprint=footprint)
        - ndi.generic_filter(image, np.min, footprint=footprint)
In [ ]:
%timeit erosion_fast(image, footprint=footprint)
In [ ]:
!pip install numba
In [ ]:
# Taken from Juan Nunez-Iglesias's blog post:

import numba
from numba import cfunc, carray
from numba.types import intc, CPointer, float64, intp, voidptr
from scipy import LowLevelCallable

def jit_filter_function(filter_function):
    jitted_function = numba.jit(filter_function, nopython=True)
    @cfunc(intc(CPointer(float64), intp, CPointer(float64), voidptr))
    def wrapped(values_ptr, len_values, result, data):
        values = carray(values_ptr, (len_values,), dtype=float64)
        result[0] = jitted_function(values)
        return 1
    return LowLevelCallable(wrapped.ctypes)
In [ ]:
def fmin(values):
    result = np.inf
    for v in values:
        if v < result:
            result = v
    return result
In [ ]:
%timeit ndi.generic_filter(image, fmin, footprint=footprint)

Parallel and batch processing

Joblib (developed by scikit-learn) is used for:

  1. transparent disk-caching of the output values and lazy re-evaluation (memoize pattern)
  2. easy simple parallel computing
  3. logging and tracing of the execution
In [ ]:
from sklearn.externals import joblib

from joblib import Memory
mem = Memory(cachedir='/tmp/joblib')
In [ ]:
from skimage import segmentation

def cached_slic(image):
    return segmentation.slic(image)
In [ ]:
from skimage import io
large_image = io.imread('../images/Bells-Beach.jpg')
In [ ]:
%time segmentation.slic(large_image)
In [ ]:
%time cached_slic(large_image)
In [ ]:
%time cached_slic(large_image)

Dask is a parallel computing library. It has two components:

  • Dynamic task scheduling optimized for computation. This is similar to Airflow, Luigi, Celery, or Make, but optimized for interactive computational workloads.
  • “Big Data” collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of the dynamic task schedulers.
  • See Matt Rocklin's blogpost for a more detailed example