#!/usr/bin/env python # coding: utf-8 # Images # ==== # An ``Images`` object is a collection of either 2D images or 3D volumes. Under the hood, it wraps an n-dimensional array, and supports either distributed operations via Spark or local operations via `numpy`, with an identical API. It supports several simple manipulations of image content, exporting image data, and conversion to other formats. # This example can be run locally and does not require Spark. # ## Setup imports # In[1]: get_ipython().run_line_magic('matplotlib', 'inline') # In[19]: import matplotlib.pyplot as plt import seaborn as sns sns.set_style('darkgrid') sns.set_context('notebook') from showit import image, tile # In[3]: import thunder as td # ## Loading images # `images` data can be loaded using the `td.images.from*` methods, which support loading from a few different formats. Here we'll load example data. # In[4]: data = td.images.fromexample('fish') # We can inspect the object to see basic properties like `shape`, `dtype`, and whether it's in `'local'` or `'spark'` mode. # In[5]: data # These are 20 3d volumes, each one with shape `(2, 76, 87)`. Let's look at the first volume # In[6]: tile(data[0]); # Note that, although `data` is not itself an array, we can index into it using bracket notation, and pass it as input to plotting methods that expect arrays, because it will be automatically converted. # For an example of 2D data, we can load another one of the examples # In[7]: data = td.images.fromexample('mouse') # Look at the first image # In[8]: image(data[0]); # ## Image manipulations # # An ``Images`` object has a variety of methods for manipulation, all of which are automatically parallelized across images if running on a cluster. # One common manipulation on volumetric data is computing a maximum projection across the z dimension. # In[9]: data = td.images.fromexample('fish') # In[10]: projections = data.max_projection(axis=0) image(projections[0]); # We can also subselect a set of planes, specifying the top and bottom of the desired range: # In[11]: subset = data[0, 0, :, :] image(subset); # And we can subsample in space: # In[12]: subsampled = data.subsample([1,5,5])[0, 0, :, :] image(subsampled); # Finally, we can perform operations that aggregate across images. For example, computing the standard deviation: # In[13]: statistic = data.std()[0, 0, :, :] image(statistic); # The result of image operations can be saved by exporting each image to a `png` or `tif` file. # ``` # data.max_projection(axis=0).topng('directory') # data.max_projection(axis=0).totif('directory') # ``` # ## Conversions # # We commonly encounter images or volumes that vary over time, e.g. from a movie. It can be useful to convert these data into a `Series` object: another wrapper for n-dimensional arrays designed to work with collections of one-dimensional indexed records, often time series. # Here we load our `image` data and convert to `series` # In[14]: data = td.images.fromexample('fish') ts = data.toseries() # Let's check properties of the `Series` to make sure the conversion makes sense. We have twenty images, so there should be twenty time points. # In[15]: ts.index # The shape should be the original pixel dimensions `(2, 76, 87)` and the time dimension `(20)` # In[16]: ts.shape # This conversion from `images` to `series` is essentially a transpose, but in the distributed setting it uses an efficient blocked representation. # If we want to collapse the pixel dimensions, we can use `flatten` # In[17]: ts.flatten().shape # We can also quickly look at some example time series, after filtering on standard deviation and normalizing. # In[18]: samples = ts.flatten().filter(lambda x: x.std() > 6).normalize().sample(n=50).toarray() plt.plot(samples.T); # For a large data set that will be analyzed repeatedly as `series`, it might be faster and more convienient to save `images` data to a collection of flat binary files on a distributed file system, which can in turn be read back in directly as `series` data. This can be performed as follows: # ``` # data = td.images.fromexample('fish') # data.toseries().tobinary('directory', overwrite=True) # ts = td.series.frombinary('directory') # ``` # There are more methods on `images` data, and algorithm packages that take `images` data as input. See the other tutorials and documentation for more!