Dask and Scikit-Image on a Cluster¶

In [ ]:

from dask_kubernetes import KubeCluster
cluster = KubeCluster(n_workers=40)
cluster

In [ ]:

from dask.distributed import Client, progress
client = Client(cluster)

In [ ]:

%matplotlib inline
from dask import delayed
import skimage.io
import matplotlib.pyplot as plt

Read a single image from Google Cloud Storage¶

In [ ]:

sample = skimage.io.imread('https://storage.googleapis.com/pangeo-data/FIB-25/iso.00000.png')

fig = plt.figure(figsize=(10, 10))
plt.imshow(sample, cmap='gray')

Construct a Lazy Dask Array from all of the images¶

In [ ]:

from dask import delayed
import dask.array as da

# filenames = ['https://storage.googleapis.com/pangeo-data/FIB-25/iso.%05d.png' % i for i in range(0, 8090)]
filenames = ['https://storage.googleapis.com/pangeo-data/FIB-25/iso.%05d.png' % i for i in range(0, 1000)]
images = [delayed(skimage.io.imread)(fn) for fn in filenames]
arrays = [da.from_delayed(im, shape=sample.shape, dtype=sample.dtype) for im in images]
x = da.stack(arrays, axis=0)
x

Rearrange data by Pixel¶

Currently our data is organized in chunks of size 1x6000x6000, one chunk for each image. If we want to do analysis per-pixel then we need to rechunk our data, which we do below into chunks of size 1000x100x100

In [ ]:

x = x.rechunk((1000, 100, 100))

Ask Dask to persist that data in memory¶

In [ ]:

x = x.persist()
progress(x)

Do per-pixel computations, like FFTs¶

In [ ]:

y = da.fft.fft(x, axis=0)

In [ ]:

%matplotlib inline

In [ ]:

import matplotlib.pyplot as plt

plt.figure(figsize=(20, 5))
plt.plot(y[:, 3000, 3000])
plt.show()