ETHZ: 227-0966-00L¶

Quantitative Big Imaging¶

February 20, 2020¶

Introduction and Overview¶

Anders Kaestner¶

Overview¶

Who are we?
Who are you?
What is expected?
Why does this class exist?
Collection
Changing computing (Parallel / Cloud)
Course outline

Overview¶

What is an image?
Where do images come from?
Science and Reproducibility
Workflows

Anders Kaestner, PhD (anders.kaestner@psi.ch)¶

Beamline scientist at the ICON Beamline at the SINQ (Neutron Source) at Paul Scherrer Institute
- Lecturer at ETH Zurich
Algorithm developer Varian Medical Systems, Baden-Daettwil
Post Doc at ETH Zurich, Inst for Terrestial Ecology
PhD Student at Chalmers Institute of Technology, Sweden, Signal processing

Stefano van Gogh (stefano.van-gogh@psi.ch)¶

Exercise assistance
PhD Student in the X-Ray Microscopy Group at ETH Zurich and Swiss Light Source at Paul Scherrer Institute

Guest Lecturers (to be confirmed)¶

Kevin Mader¶

Data scientis at Apple
CTO at 4Quant for Big Image Analytics (ETH Spin-off)
Lecturer at ETH Zurich
Formerly Postdoc in the X-Ray Microscopy Group at ETH Zurich (2013-2015)
PhD Student at Swiss Light Source at Paul Scherrer Institute (2008-2012)

Kevin developed this course and a great part of the slide are his work.

Who are you?¶

A wide spectrum of backgrounds¶	A wide range of skills¶
Biomedical Engineers, Physicists, Chemists, Art History Researchers, Mechanical Engineers, and Computer Scientists	I think Ive heard of python before I write template C++ code and hand optimize it afterwards

So how will this ever work?¶

Adaptive assignments¶

Conceptual, graphical assignments with practical examples¶

Emphasis on chosing correct steps and understanding workflow

Opportunities to create custom implementations, plugins, and perform more complicated analysis on larger datasets if interested¶

Emphasis on performance, customizing analysis, and scalability

Course Expectations¶

Exercises¶	Science Project¶
Usually 1 set per lecture Optional (but recommended!) Easy - using GUIs (KNIME and ImageJ) and completing Matlab Scripts (just lecture 2) Advanced - Writing Python, Java, Scala, ...	Optional (but strongly recommended) Applying Techniques to answer scientific question! Ideally use on a topic relevant for your current project, thesis, or personal activities or choose from one of ours (will be online, soon) Present approach, analysis, and results

Literature / Useful References¶

General Material¶

Jean Claude, Morphometry with R
Online through ETHZ
Buy it
John C. Russ, “The Image Processing Handbook”,(Boca Raton, CRC Press)
Available online within domain ethz.ch (or proxy.ethz.ch / public VPN)
J. Weickert, Visualization and Processing of Tensor Fields
Online through ETHZ

Today's Material¶

Imaging
ImageJ and SciJava
Cloud Computing
The Case for Energy-Proportional Computing _ Luiz André Barroso, Urs Hölzle, IEEE Computer, December 2007_
Concurrency
Reproducibility
Trouble at the lab Scientists like to think of science as self-correcting. To an alarming degree, it is not
Why is reproducible research important? The Real Reason Reproducible Research is Important
Science Code Manifesto
Reproducible Research Class @ Johns Hopkins University

Motivation - You have data!¶

Imaging experiments produce a lot of data¶

Motivation - how to proceed?¶

Crazy Workflow

To understand what, why and how from the moment an image is produced until it is finished (published, used in a report, …)
To learn how to go from one analysis on one image to 10, 100, or 1000 images (without working 10, 100, or 1000X harder)

High acquisition rates¶

Detectors are getting bigger and faster constantly
Todays detectors are really fast
2560 x 2160 images @ 1500+ times a second = 8GB/s
Matlab / Avizo / Python / … are saturated after 60 seconds
A single camera
More information per day than Facebook
Three times as many images per second as Instagram

Different sources of images¶

X-Ray¶

SRXTM images at (>1000fps) → 8GB/s
cSAXS diffraction patterns at 30GB/s
Nanoscopium Beamline, 10TB/day, 10-500GB file sizes

Optical¶

Light-sheet microscopy (see talk of Jeremy Freeman) produces images → 500MB/s
High-speed confocal images at (>200fps) → 78Mb/s

Personal¶

GoPro 4 Black - 60MB/s (3840 x 2160 x 30fps) for $600
fps1000 - 400MB/s (640 x 480 x 840 fps) for $400

Motivation¶

Experimental Design finding the right technique, picking the right dyes and samples has stayed relatively consistent, better techniques lead to more demanding scientists.
Management storing, backing up, setting up databases, these processes have become easier and more automated as data magnitudes have increased
Measurements the actual acquisition speed of the data has increased wildly due to better detectors, parallel measurement, and new higher intensity sources
Post Processing this portion has is the most time-consuming and difficult and has seen minimal improvements over the last years

How is time used during the experiment life cycle?¶

So... how much is a TB, really?¶

If you looked at one 1000 x 1000 sized image

In [67]:

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

plt.matshow(np.random.uniform(size = (1000, 1000)), 
           cmap = 'viridis')

Out[67]:

<matplotlib.image.AxesImage at 0x1c25d72490>

every second, it would take you

In [100]:

# assuming 16 bit images and a 'metric' terabyte
time_per_tb=1e12/(1000*1000*16/8) / (60*60)
print("%04.1f hours to view a terabyte" % (time_per_tb))

138.9 hours to view a terabyte

Overwhelmed scientist¶

Count how many cells are in the bone slice
Ignore the ones that are ‘too big’ or shaped ‘strangely’
Are there more on the right side or left side?
Are the ones on the right or left bigger, top or bottom?

cells in bone tissue

More overwhelmed¶

Many samples are needed¶

Do it all over again for 96 more samples
this time in 3D with 2000 slices instead of just one!

more samples

Bring on the pain¶

Great variations in the population¶

Now again with 1090 samples!
How to measure?
How to analyze?

even more samples

It gets better¶

Those metrics were quantitative and could be easily visually extracted from the images
What happens if you have softer metrics

alignment

How aligned are these cells?
Is the group on the left more or less aligned than the right?
errr?

Dynamic Information¶

How many bubbles are here?
How fast are they moving?
Do they all move the same speed?
Do bigger bubbles move faster?
Do bubbles near the edge move slower?
Are they rearranging?

Course Overview¶

In [45]:

import json, pandas as pd
course_df = pd.read_json('../common/schedule.json')
course_df['Date'] = course_df['Lecture'].map(lambda x: x.split('-')[0])
course_df['Title'] = course_df['Lecture'].map(lambda x: x.split('-')[-1])
course_df[['Date', 'Title', 'Description']]

Out[45]:

	Date	Title	Description
0	20th February	Introduction and Workflows	Basic overview of the course, introduction to ...
1	27th February	Ground Truth: Building and Augmenting Datasets	Examples of large datasets, how they were buil...
2	5th March	Image Enhancement	Overview of what techniques are available for ...
3	12th March	Basic Segmentation, Discrete Binary Structures	How to convert images into structures, startin...
4	19th March	Advanced Segmentation	More advanced techniques for extracting struct...
5	26th March	Supervised Problems and Segmentation	More advanced techniques for extracting struct...
6	2nd April	Analyzing Single Objects, Shape, and Texture	The analysis and characterization of single st...
7	9th April	Analyzing Complex Objects and Distributions	What techniques are available to analyze more ...
8	16th April	Dynamic Experiments	Performing tracking and registration in dynami...
9	23rd April	Statistics, Prediction, and Reproducibility	Making a statistical analysis from quantified ...
10	30th April	Guest Lectures	How Roche does Microscopy at Scale with High C...
11	7th May	Scaling Up / Big Data	Performing large scale analyses on clusters an...
12	14th May	Project Presentations	You present your projects

Overview: Segmentation¶

In [46]:

course_df[['Title', 'Description', 'Applications']][3:6].T

Out[46]:

	3	4	5
Title	Basic Segmentation, Discrete Binary Structures	Advanced Segmentation	Supervised Problems and Segmentation
Description	How to convert images into structures, startin...	More advanced techniques for extracting struct...	More advanced techniques for extracting struct...
Applications	Identify cells from noise, background, and dust	Identifying fat and ice crystals in ice cream ...	Identifying fat and ice crystals in ice cream ...

Overview: Analysis¶

In [47]:

course_df[['Title', 'Description', 'Applications']][6:9].T

Out[47]:

	6	7	8
Title	Analyzing Single Objects, Shape, and Texture	Analyzing Complex Objects and Distributions	Dynamic Experiments
Description	The analysis and characterization of single st...	What techniques are available to analyze more ...	Performing tracking and registration in dynami...
Applications	Count cells and determine their average shape ...	Seperate clumps of cells, analyze vessel netwo...	Turning a video of foam flow into metrics like...

Overview: Big Imaging¶

In [48]:

course_df[['Title', 'Description', 'Applications']][9:12].T

Out[48]:

	9	10	11
Title	Statistics, Prediction, and Reproducibility	Guest Lectures	Scaling Up / Big Data
Description	Making a statistical analysis from quantified ...	How Roche does Microscopy at Scale with High C...	Performing large scale analyses on clusters an...
Applications	Determine if/how different a cancerous cell is...	Robust analysis of millions of images for maki...	Performing large scale analyses using ETHs clu...

Overview: Wrapping Up¶

In [49]:

course_df[['Title', 'Description', 'Applications']][12:13].T

Out[49]:

	12
Title	Project Presentations
Description	You present your projects
Applications

Images¶

An introduction to images¶

What is an image?¶

A very abstract definition:

A pairing between spatial information (position)
and some other kind of information (value).

In most cases this is a 2- or 3-dimensional position (x,y,z coordinates) and a numeric value (intensity)

Image sampling¶

The world is	The computer needs
Continuous No boundaries	Discrete levels Limited extent

In [50]:

import numpy as np
import matplotlib.pyplot as plt
from skimage.transform import resize

img=np.load('../common/data/wood.npy');
plt.figure(figsize=[15,7])
plt.subplot(2,3,1); plt.imshow(img); plt.title('Original')
downsize =  2; plt.subplot(2,3,2); plt.imshow(resize(img,(img.shape[0] // downsize, img.shape[1] // downsize), anti_aliasing=False)); plt.title('Downsize {0}x{0}'.format(downsize))
downsize = 32; plt.subplot(2,3,3); plt.imshow(resize(img,(img.shape[0] // downsize, img.shape[1] // downsize),anti_aliasing=False)); plt.title('Downsize {0}x{0}'.format(downsize))
levels   = 16; plt.subplot(2,3,5); plt.imshow(np.floor(img*levels)); plt.title('{0} Levels'.format(levels));
levels   = 4 ; plt.subplot(2,3,6); plt.imshow(np.floor(img*levels)); plt.title('{0} Levels'.format(levels));

Let's create a small image¶

In [69]:

import numpy as np
basic_image = np.random.choice(range(100), size = (5,5))
xx, yy = np.meshgrid(range(basic_image.shape[1]), range(basic_image.shape[0]))
image_df = pd.DataFrame(dict(x = xx.ravel(),
                 y = yy.ravel(),
                 Intensity = basic_image.ravel()))
image_df[['x', 'y', 'Intensity']].head(5)

Out[69]:

	x	Intensity
0	0	33
1	1	78
2	2	5
3	3	73
4	4	22

In [70]:

import matplotlib.pyplot as plt
plt.imshow(basic_image, cmap = 'gray')
plt.colorbar(); 

2D Intensity Images¶

The next step is to apply a color map (also called lookup table, LUT) to the image

so it is a bit more exciting
some features are easier to detect Rogowitz et al. 1996

In [71]:

fig, ax1 = plt.subplots(1,1)
plot_image = ax1.matshow(basic_image, cmap = 'Blues')
plt.colorbar(plot_image)

for _, c_row in image_df.iterrows():
    ax1.text(c_row['x'], c_row['y'], s = '%02d' % c_row['Intensity'], fontdict = dict(color = 'r'))

Color maps can be arbitrarily defined based on how we would like to visualize the information in the image

In [72]:

fig, ax1 = plt.subplots(1,1)
plot_image = ax1.matshow(basic_image, cmap = 'jet')
plt.colorbar(plot_image)

Out[72]:

<matplotlib.colorbar.Colorbar at 0x1c5c4eb990>

In [73]:

fig, ax1 = plt.subplots(1,1)

plot_image = ax1.matshow(basic_image, cmap = 'hot')
plt.colorbar(plot_image)

Out[73]:

<matplotlib.colorbar.Colorbar at 0x1c508ad8d0>

Lookup Tables¶

Formally a lookup table is a function which $$ f(\textrm{Intensity}) \rightarrow \textrm{Color} $$

Matplotlib's color maps¶

In [52]:

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib import cm
#from colorspacious import cspace_converter
from collections import OrderedDict
cmaps = OrderedDict() 
cmaps['Perceptually Uniform Sequential'] = [
            'viridis', 'plasma', 'inferno', 'magma', 'cividis']

cmaps['Sequential'] = [
            'Greys', 'Purples', 'Blues', 'Greens', 'Oranges', 'Reds',
            'YlOrBr', 'YlOrRd', 'OrRd', 'PuRd', 'RdPu', 'BuPu',
            'GnBu', 'PuBu', 'YlGnBu', 'PuBuGn', 'BuGn', 'YlGn']

cmaps['Sequential (2)'] = [
            'binary', 'gist_yarg', 'gist_gray', 'gray', 'bone', 'pink',
            'spring', 'summer', 'autumn', 'winter', 'cool', 'Wistia',
            'hot', 'afmhot', 'gist_heat', 'copper']

cmaps['Diverging'] = [
            'PiYG', 'PRGn', 'BrBG', 'PuOr', 'RdGy', 'RdBu',
            'RdYlBu', 'RdYlGn', 'Spectral', 'coolwarm', 'bwr', 'seismic']

cmaps['Cyclic'] = ['twilight', 'twilight_shifted', 'hsv']

cmaps['Qualitative'] = ['Pastel1', 'Pastel2', 'Paired', 'Accent',
                        'Dark2', 'Set1', 'Set2', 'Set3',
                        'tab10', 'tab20', 'tab20b', 'tab20c']

cmaps['Miscellaneous'] = [
            'flag', 'prism', 'ocean', 'gist_earth', 'terrain', 'gist_stern',
            'gnuplot', 'gnuplot2', 'CMRmap', 'cubehelix', 'brg',
            'gist_rainbow', 'rainbow', 'jet', 'nipy_spectral', 'gist_ncar']


nrows = max(len(cmap_list) for cmap_category, cmap_list in cmaps.items())


gradient = np.linspace(0, 1, 256)
gradient = np.vstack((gradient, gradient))


def plot_color_gradients(cmap_category, cmap_list, nrows):
    fig, axes = plt.subplots(nrows=nrows)
    fig.subplots_adjust(top=0.95, bottom=0.01, left=0.2, right=0.99)
    axes[0].set_title(cmap_category + ' colormaps', fontsize=14)

    for ax, name in zip(axes, cmap_list):
        ax.imshow(gradient, aspect='auto', cmap=plt.get_cmap(name))
        pos = list(ax.get_position().bounds)
        x_text = pos[0] - 0.01
        y_text = pos[1] + pos[3]/2.
        fig.text(x_text, y_text, name, va='center', ha='right', fontsize=10)

    # Turn off *all* ticks & spines, not just the ones with colormaps.
    for ax in axes:
        ax.set_axis_off()


for cmap_category, cmap_list in cmaps.items():
    plot_color_gradients(cmap_category, cmap_list, nrows)
    

Never use rainbox maps like jet, see No more rainbows!

In [74]:

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
xlin = np.linspace(0, 1, 100)
colors = ['Red','Green','Blue']
plt.figure(figsize=[15,4])
for i in np.arange(0,3) :
    plt.subplot(1,3,i+1)

    plt.scatter(xlin, 
            plt.cm.hot(xlin)[:,i],
            c = plt.cm.hot(xlin),label="hot")
    plt.scatter(xlin, 
            plt.cm.Blues(xlin)[:,i], 
            c = plt.cm.Blues(xlin),label="blues")

    plt.scatter(xlin, 
            plt.cm.jet(xlin)[:,i], 
            c = plt.cm.jet(xlin),label='jet')

    plt.xlabel('Intensity');
    plt.ylabel('{0} Component'.format(colors[i]));

Applied LUTs¶

These transformations can also be non-linear as is the case of the graph below where the mapping between the intensity and the color is a $\log$ relationship meaning the the difference between the lower values is much clearer than the higher ones

In [54]:

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
xlin = np.logspace(-2, 5, 500)
log_xlin = np.log10(xlin)
norm_xlin = (log_xlin-log_xlin.min())/(log_xlin.max()-log_xlin.min())
fig, ax1 = plt.subplots(1,1)

ax1.scatter(xlin, plt.cm.hot(norm_xlin)[:,0], 
            c = plt.cm.hot(norm_xlin))

ax1.scatter(xlin, plt.cm.hot(xlin/xlin.max())[:,0], 
            c = plt.cm.hot(norm_xlin))
ax1.set_xscale('log');ax1.set_xlabel('Intensity');ax1.set_ylabel('Red Component');

On a real image the difference is even clearer

In [53]:

%matplotlib inline
import matplotlib.pyplot as plt
from skimage.io import imread
fig, (ax1, ax2, ax3) = plt.subplots(1,3, figsize = (12, 4))
in_img = imread('../common/figures/bone-section.png')[:,:,0].astype(np.float32)
ax1.imshow(in_img, cmap = 'gray');
ax1.set_title('grayscale LUT');

ax2.imshow(in_img, cmap = 'hot');
ax2.set_title('hot LUT');

ax3.imshow(np.log2(in_img+1), cmap = 'gray');
ax3.set_title('grayscale-log LUT');

3D Images¶

For a 3D image, the position or spatial component has a 3rd dimension (z if it is a spatial, or t if it is a movie)

Volume	Time series

In [75]:

import numpy as np
vol_image = np.arange(27).reshape((3,3,3))
print(vol_image)

[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]

This can then be rearranged from a table form into an array form and displayed as a series of slices

In [55]:

%matplotlib inline
import matplotlib.pyplot as plt
from skimage.util import montage as montage2d
print(montage2d(vol_image, fill = 0))
plt.matshow(montage2d(vol_image, fill = 0), cmap = 'jet')

[[ 0  1  2  9 10 11]
 [ 3  4  5 12 13 14]
 [ 6  7  8 15 16 17]
 [18 19 20  0  0  0]
 [21 22 23  0  0  0]
 [24 25 26  0  0  0]]

Out[55]:

<matplotlib.image.AxesImage at 0x1c599ef2d0>

Multiple Values¶

In the images thus far, we have had one value per position, but there is no reason there cannot be multiple values. In fact this is what color images are (red, green, and blue) values and even 4 channels with transparency (alpha) as a different. For clarity we call the dimensionality of the image the number of dimensions in the spatial position, and the depth the number in the value.

In [56]:

import pandas as pd
from itertools import product
import numpy as np
base_df = pd.DataFrame([dict(x = x, y = y) for x,y in product(range(5), range(5))])
base_df['Intensity'] = np.random.uniform(0, 1, 25)
base_df['Transparency'] = np.random.uniform(0, 1, 25)
base_df.head(5)

Out[56]:

	y	Intensity	Transparency
0	0	0.051095	0.618102
1	1	0.096233	0.314752
2	2	0.226459	0.127602
3	3	0.491454	0.273547
4	4	0.830064	0.974110

This can then be rearranged from a table form into an array form and displayed as a series of slices

In [57]:

%matplotlib inline
import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.scatter(base_df['x'], base_df['y'], c = plt.cm.gray(base_df['Intensity']), s = 1000)
ax1.set_title('Intensity')
ax2.scatter(base_df['x'], base_df['y'], c = plt.cm.gray(base_df['Transparency']), s = 1000)
ax2.set_title('Transparency')

Out[57]:

Text(0.5, 1.0, 'Transparency')

In [99]:

fig, (ax1) = plt.subplots(1, 1)
ax1.scatter(base_df['x'], base_df['y'], c = plt.cm.jet(base_df['Intensity']), s = 1000*base_df['Transparency'])
ax1.set_title('Intensity')

Out[99]:

Text(0.5, 1.0, 'Intensity')

Hyperspectral Imaging¶

At each point in the image (black dot), instead of having just a single value, there is an entire spectrum. A selected group of these (red dots) are shown to illustrate the variations inside the sample. While certainly much more complicated, this still constitutes and image and requires the same sort of techniques to process correctly.

In [58]:

%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
from skimage.io import imread
import os
raw_img = imread(os.path.join('..', 'common', 'data', 'raw.jpg'))
im_pos = pd.read_csv(os.path.join('..', 'common', 'data', 'impos.csv'), header = None)
im_pos.columns = ['x', 'y']
fig, ax1 = plt.subplots(1,1, figsize = (8, 8));
ax1.imshow(raw_img);
ax1.scatter(im_pos['x'], im_pos['y'], s = 1, c = 'blue');

In [59]:

full_df = pd.read_csv(os.path.join('..', 'common', 'data', 'full_img.csv')).query('wavenum<1200')
print(full_df.shape[0], 'rows')
full_df.head(5)

210750 rows

Out[59]:

	x	y	wavenum	val
0	168.95	358.8	750	527.571102
1	168.95	358.8	753	459.778584
2	168.95	358.8	756	406.337255
3	168.95	358.8	759	341.858123
4	168.95	358.8	762	246.645673

In [60]:

full_df['g_x'] = pd.cut(full_df['x'], 5)
full_df['g_y'] = pd.cut(full_df['y'], 5)
fig, m_axs = plt.subplots(5, 5, figsize = (12, 12))
for ((g_x, g_y), c_rows), c_ax in zip(full_df.sort_values(['x','y']).groupby(['g_x', 'g_y']), m_axs.flatten()):
    c_ax.plot(c_rows['wavenum'], c_rows['val'], 'r.')

Image Formation¶

Traditional Imaging

Impulses Light, X-Rays, Electrons, A sharp point, Magnetic field, Sound wave
Characteristics Electron Shell Levels, Electron Density, Phonons energy levels, Electronic, Spins, Molecular mobility
Response Absorption, Reflection, Phase Shift, Scattering, Emission
Detection Your eye, Light sensitive film, CCD / CMOS, Scintillator, Transducer

Where do images come from?¶

In [61]:

import pandas as pd
from io import StringIO

pd.read_table(StringIO("""Modality\tImpulse	Characteristic	Response	Detection
Light Microscopy	White Light	Electronic interactions	Absorption	Film, Camera
Phase Contrast	Coherent light	Electron Density (Index of Refraction)	Phase Shift	Phase stepping, holography, Zernike
Confocal Microscopy	Laser Light	Electronic Transition in Fluorescence Molecule	Absorption and reemission	Pinhole in focal plane, scanning detection
X-Ray Radiography	X-Ray light	Photo effect and Compton scattering	Absorption and scattering	Scintillator, microscope, camera
Neutron Radiography	Neutrons	Interaction with nucleus	Scattering and absorption	Scintillator, optics, camera
Ultrasound	High frequency sound waves	Molecular mobility	Reflection and Scattering	Transducer
MRI	Radio-frequency EM	Unmatched Hydrogen spins	Absorption and reemission	RF coils to detect
Atomic Force Microscopy	Sharp Point	Surface Contact	Contact, Repulsion	Deflection of a tiny mirror"""))

Out[61]:

	Modality	Impulse	Characteristic	Response	Detection
0	Light Microscopy	White Light	Electronic interactions	Absorption	Film, Camera
1	Phase Contrast	Coherent light	Electron Density (Index of Refraction)	Phase Shift	Phase stepping, holography, Zernike
2	Confocal Microscopy	Laser Light	Electronic Transition in Fluorescence Molecule	Absorption and reemission	Pinhole in focal plane, scanning detection
3	X-Ray Radiography	X-Ray light	Photo effect and Compton scattering	Absorption and scattering	Scintillator, microscope, camera
4	Neutron Radiography	Neutrons	Interaction with nucleus	Scattering and absorption	Scintillator, optics, camera
5	Ultrasound	High frequency sound waves	Molecular mobility	Reflection and Scattering	Transducer
6	MRI	Radio-frequency EM	Unmatched Hydrogen spins	Absorption and reemission	RF coils to detect
7	Atomic Force Microscopy	Sharp Point	Surface Contact	Contact, Repulsion	Deflection of a tiny mirror

Acquiring Images¶

Traditional / Direct imaging¶

Visible images produced or can be easily made visible
Optical imaging, microscopy

In [62]:

%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
from skimage.io import imread
from scipy.ndimage import convolve
from skimage.morphology import disk
import numpy as np
import os

bone_img = imread(os.path.join('..', 'common', 'figures', 'tiny-bone.png')).astype(np.float32)
# simulate measured image
conv_kern = np.pad(disk(2), 1, 'constant', constant_values = 0)
meas_img = convolve(bone_img[::-1], conv_kern)
# run deconvolution
dekern = np.fft.ifft2(1/np.fft.fft2(conv_kern))
rec_img = convolve(meas_img, dekern)[::-1]
# show result
fig, (ax_orig, ax1, ax2) = plt.subplots(1,3, 
                               figsize = (12, 4))
ax_orig.imshow(bone_img, cmap = 'bone')
ax_orig.set_title('Original Object')

ax1.imshow(np.real(meas_img), cmap = 'bone')
ax1.set_title('Measurement')

ax2.imshow(np.real(rec_img), cmap = 'bone', vmin = 0, vmax = 255)
ax2.set_title('Reconstructed');

/opt/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py:85: ComplexWarning: Casting complex values to real discards the imaginary part
  return array(a, dtype, copy=False, order=order)

Indirect / Computational imaging¶

Recorded information does not resemble object
Response must be transformed (usually computationally) to produce an image

In [63]:

%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
from skimage.io import imread
from scipy.ndimage import convolve
from skimage.morphology import disk
import numpy as np
import os

bone_img = imread(os.path.join('..', 'common', 'figures', 'tiny-bone.png')).astype(np.float32)
# simulate measured image
meas_img = np.log10(np.abs(np.fft.fftshift(np.fft.fft2(bone_img))))
print(meas_img.min(), meas_img.max(), meas_img.mean())
fig, (ax1, ax_orig) = plt.subplots(1,2, 
                               figsize = (12, 6))
ax_orig.imshow(bone_img, cmap = 'bone')
ax_orig.set_title('Original Object')

ax1.imshow(meas_img, cmap = 'hot')
ax1.set_title('Measurement');

1.1465129 6.6112556 3.3563936

Traditional Imaging¶

Traditional Imaging: Model¶

Traditional Imaging Model

$$ \left[\left([b(x,y)*s_{ab}(x,y)]\otimes h_{fs}(x,y)\right)*h_{op}(x,y)\right]*h_{det}(x,y)+d_{dark}(x,y) $$

$s_{ab}$ is the only information you are really interested in, so it is important to remove or correct for the other components

For color (non-monochromatic) images the problem becomes even more complicated $$ \int_{0}^{\infty} {\left[\left([b(x,y,\lambda)*s_{ab}(x,y,\lambda)]\otimes h_{fs}(x,y,\lambda)\right)*h_{op}(x,y,\lambda)\right]*h_{det}(x,y,\lambda)}\mathrm{d}\lambda+d_{dark}(x,y) $$

Indirect Imaging (Computational Imaging)¶

Tomography through projections Microlenses Light-field photography	Diffraction patterns Hyperspectral imaging with Raman, IR, CARS Surface Topography with cantilevers (AFM)

Image Analysis¶

An image is a bucket of pixels.
How you choose to turn it into useful information is strongly dependent on your background

Image Analysis: Experimentalist¶

Characteristics¶

Problem-driven
Top-down
Reality Model-based

Examples¶

cell counting
porosity

Image Analysis: Computer Vision Approaches¶

Characteristics¶

Method-driven
Feature-based
Image Model-based
Engineer features for solving problems

Examples¶

Edge detection
Face detection

Image Analysis: Deep Learning Approach¶

Characteristics¶

Results-driven
Biology ‘inspired’
Build both image processing and analysis from scratch

Examples¶

Captioning images
Identifying unusual events

On Science¶

What is the purpose?¶

Discover and validate new knowledge

How?¶

Use the scientific method as an approach to convince other people
Build on the results of others so we don't start from the beginning

Important Points¶

While qualitative assessment is important, it is difficult to reliably produce and scale
Quantitative analysis is far from perfect, but provides metrics which can be compared and regenerated by anyone

Inspired by: imagej-pres

Science and Imaging¶

Images are great for qualitative analyses since our brains can quickly interpret them without large programming investements.

Proper processing and quantitative analysis is however much more difficult with images.¶

If you measure a temperature, quantitative analysis is easy, $50K$.
If you measure an image it is much more difficult and much more prone to mistakes, subtle setup variations, and confusing analyses

Furthermore in image processing there is a plethora of tools available¶

Thousands of algorithms available
Thousands of tools
Many images require multi-step processing
Experimenting is time-consuming

Why quantitative?¶

Human eyes have issues¶

Which center square seems brighter?

In [64]:

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
xlin = np.linspace(-1,1, 3)
xx, yy = np.meshgrid(xlin, xlin)
img_a = 25*np.ones((3,3))
img_b = np.ones((3,3))*75
img_a[1,1] = 50
img_b[1,1] = 50
fig, (ax1, ax2) = plt.subplots(1,2, figsize = (12, 5));
ax1.matshow(img_a, vmin = 0, vmax = 100, cmap = 'bone');
ax2.matshow(img_b, vmin = 0, vmax = 100, cmap = 'bone');

Intensity gradients¶

Are the intensities constant in the image?

In [65]:

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
xlin = np.linspace(-1,1, 10)
xx, yy = np.meshgrid(xlin, xlin)

fig, ax1 = plt.subplots(1,1, figsize = (6, 6))
ax1.matshow(xx, vmin = -1, vmax = 1, cmap = 'bone')

Out[65]:

<matplotlib.image.AxesImage at 0x1c5b4853d0>

Reproducibility vs. Repeatability¶

Reproducibility¶

Repeatability¶

Reproducibility vs. Repeatability¶

Science demands repeatability! and really wants reproducability

Experimental conditions can change rapidly and are difficult to make consistent
Animal and human studies are prohibitively time consuming and expensive to reproduce
Terabyte datasets cannot be easily passed around many different groups
Privacy concerns can also limit sharing and access to data

Science is already difficult enough
Image processing makes it even more complicated
Many image processing tasks are multistep, have many parameters, use a variety of tools, and consume a very long time

How can we keep track of everything for ourselves and others?¶

We can make the data analysis easy to repeat by an independent 3rd party
Document the analysis steps
Write clear and understandable code

Computing has changed: Parallel¶

Moores Law¶

$$ \textrm{Transistors} \propto 2^{T/(\textrm{18 months})} $$

In [66]:

%matplotlib inline
# stolen from https://gist.github.com/humberto-ortiz/de4b3a621602b78bf90d
import pandas as pd
import matplotlib.pyplot as plt
from io import StringIO
moores_txt=["Id Name  Year  Count(1000s)  Clock(MHz)\n",
        "0            MOS65XX  1975           3.51           14\n",
        "1          Intel8086  1978          29.00           10\n",
        "2          MIPSR3000  1988         120.00           33\n",
        "3           AMDAm486  1993        1200.00           40\n",
        "4        NexGenNx586  1994        3500.00          111\n",
        "5          AMDAthlon  1999       37000.00         1400\n",
        "6   IntelPentiumIII  1999       44000.00         1400\n",
        "7         PowerPC970  2002       58000.00         2500\n",
        "8       AMDAthlon64  2003      243000.00         2800\n",
        "9    IntelCore2Duo  2006      410000.00         3330\n",
        "10         AMDPhenom  2007      450000.00         2600\n",
        "11      IntelCorei7  2008     1170000.00         3460\n",
        "12      IntelCorei5  2009      995000.00         3600"]

sio_table = StringIO(''.join(moores_txt)); moore_df = pd.read_table(sio_table, sep = '\s+', index_col = 0);
fig, ax1 = plt.subplots(1,1, figsize = (8, 4)); ax1.semilogy(moore_df['Year'], moore_df['Count(1000s)'], 'b.-', label = '1000s of transitiors'); ax1.semilogy(moore_df['Year'], moore_df['Clock(MHz)'], 'r.-', label = 'Clockspeed (MHz)') ;ax1.legend(loc = 2);

Based on data from https://gist.github.com/humberto-ortiz/de4b3a621602b78bf90d

There are now many more transistors inside a single computer but the processing speed hasn't increased. How can this be?

Multiple Core
Many machines have multiple cores for each processor which can perform tasks independently
Multiple CPUs
More than one chip is commonly present
New modalities
- GPUs provide many cores which operate at slow speed

$\rightarrow$ Parallel Code is important¶

Computing has changed: Cloud¶

Computer, servers, workstations are wildly underused (majority are <50%)
Buying a big computer that sits idle most of the time is a waste of money
- http://www-inst.eecs.berkeley.edu/~cs61c/sp14/
- “The Case for Energy-Proportional Computing,” Luiz André Barroso, Urs Hölzle, IEEE Computer, December 2007
Traditionally the most important performance criteria was time, how fast can it be done
With Platform as a service servers can be rented instead of bought
Speed is still important but using cloud computing $ / Sample is the real metric
In Switzerland a PhD student is 400x as expensive per hour as an Amazon EC2 Machine
Many competitors keep prices low and offer flexibility

Soup/Recipe Example¶

Simple Soup¶

Easy to follow the list, anyone with the right steps can execute and repeat (if not reproduce) the soup

Buy {carrots, peas, tomatoes} at market
then Buy meat at butcher
then Chop carrots into pieces
then Chop potatos into pieces
then Heat water
then Wait until boiling then add chopped vegetables
then Wait 5 minutes and add meat

More complicated soup¶

Here it is harder to follow and you need to carefully keep track of what is being performed

Steps 1-4¶

then Mix carrots with potatos $\rightarrow mix_1$
then add egg to $mix_1$ and fry for 20 minutes
then Tenderize meat for 20 minutes
then add tomatoes to meat and cook for 10 minutes $\rightarrow mix_2$
then Wait until boiling then add $mix_1$
then Wait 5 minutes and add $mix_2$

Using flow charts / workflows¶

Simple Soup¶

In [77]:

from IPython.display import SVG
import pydot
graph = pydot.Dot(graph_type='digraph', rankdir="LR")
node_names = ["Buy\nvegetables","Buy meat","Chop\nvegetables","Heat water", "Add Vegetables",
              "Wait for\nboiling","Wait 5\nadd meat"]
nodes = [pydot.Node(name = '%04d' % i, label = c_n) 
         for i, c_n in enumerate(node_names)]
for c_n in nodes:
    graph.add_node(c_n)
    
for (c_n, d_n) in zip(nodes, nodes[1:]):
    graph.add_edge(pydot.Edge(c_n, d_n))

SVG(graph.create_svg())

Out[77]:

Workflows¶

Clearly a linear set of instructions is ill-suited for even a fairly easy soup, it is then even more difficult when there are dozens of steps and different pathsways

Furthermore a clean workflow allows you to better parallelize the task since it is clear which tasks can be performed independently

In [78]:

from IPython.display import SVG
import pydot
graph = pydot.Dot(graph_type='digraph', rankdir="LR")
node_names = ["Buy\nvegetables","Buy meat","Chop\nvegetables","Heat water", "Add Vegetables",
              "Wait for\nboiling","Wait 5\nadd meat"]
nodes = [pydot.Node(name = '%04d' % i, label = c_n, style = 'filled') 
         for i, c_n in enumerate(node_names)]
for c_n in nodes:
    graph.add_node(c_n)
    
def e(i,j, col = None):
    if col is not None:
        for c in [i,j]:
            if nodes[c].get_fillcolor() is None: 
                nodes[c].set_fillcolor(col)
    graph.add_edge(pydot.Edge(nodes[i], nodes[j]))

e(0, 2, 'gold'); e(2, 4); e(3, -2, 'springgreen'); e(-2, 4, 'orange'); e(4, -1) ; e(1, -1, 'dodgerblue')

SVG(graph.create_svg())

Out[78]:

Directed Acyclical Graphs (DAG)¶

We can represent almost any computation without loops as DAG. What this allows us to do is now break down a computation into pieces which can be carried out independently. There are a number of tools which let us handle this issue.

PyData Dask - https://dask.pydata.org/en/latest/
Apache Spark - https://spark.apache.org/
Spotify Luigi - https://github.com/spotify/luigi
Airflow - https://airflow.apache.org/
KNIME - https://www.knime.com/
Google Tensorflow - https://www.tensorflow.org/
Pytorch / Torch - http://pytorch.org/

Concrete example¶

What is a DAG good for?

In [79]:

import dask.array as da

from dask.dot import dot_graph
image_1 = da.zeros((5,5), chunks = (5,5))
image_2 = da.ones((5,5), chunks = (5,5))
dot_graph(image_1.dask)

Out[79]:

In [80]:

image_3 = image_1 + image_2
dot_graph(image_3.dask)

Out[80]:

In [81]:

image_4 = (image_1-10) + (image_2*50)
dot_graph(image_4.dask)

Out[81]:

Let's go big¶

Now let's see where this can be really useful

In [91]:

import dask.array as da
from dask.dot import dot_graph
image_1 = da.zeros((1024, 1024), chunks = (512, 512))
image_2 = da.ones((1024 ,1024), chunks = (512, 512))
dot_graph(image_1.dask)

Out[91]:

In [92]:

image_4 = (image_1-10) + (image_2*50)
dot_graph(image_4.dask)

Out[92]:

In [93]:

image_5 = da.matmul(image_1, image_2)
dot_graph(image_5.dask)

Out[93]:

In [94]:

image_6 = (da.matmul(image_1, image_2)+image_1)*image_2
dot_graph(image_6.dask)

Out[94]:

In [98]:

import   ndfilters as da_ndfilt
image_7 = da_ndfilt.convolve(image_6, image_1)
dot_graph(image_7.dask)

Out[98]:

Deep Learning¶

We won't talk too much about deep learning now, but it certainly shows why DAGs are so important.

The steps above are simple toys compared to what tools are already in use for machine learning

In [ ]:

from IPython.display import SVG
from keras.applications.resnet50 import ResNet50
from keras.utils.vis_utils import model_to_dot 
resnet = ResNet50(weights = None)
SVG(model_to_dot(resnet).create_svg())

In [ ]:

ETHZ: 227-0966-00L¶

Quantitative Big Imaging¶

February 20, 2020¶

Introduction and Overview¶

Anders Kaestner¶

Overview¶

Overview¶

Anders Kaestner, PhD (anders.kaestner@psi.ch)¶

Stefano van Gogh (stefano.van-gogh@psi.ch)¶

Guest Lecturers (to be confirmed)¶

Kevin Mader¶

Who are you?¶

A wide spectrum of backgrounds¶

A wide range of skills¶

So how will this ever work?¶

Adaptive assignments¶

Conceptual, graphical assignments with practical examples¶

Opportunities to create custom implementations, plugins, and perform more complicated analysis on larger datasets if interested¶

Course Expectations¶

Exercises¶

Science Project¶

Literature / Useful References¶

General Material¶

Today's Material¶

Motivation - You have data!¶

Imaging experiments produce a lot of data¶

Motivation - how to proceed?¶

High acquisition rates¶

Different sources of images¶

X-Ray¶

Optical¶

Personal¶

Motivation¶

How is time used during the experiment life cycle?¶

So... how much is a TB, really?¶

Overwhelmed scientist¶

More overwhelmed¶

Many samples are needed¶

Bring on the pain¶

Great variations in the population¶

It gets better¶

Dynamic Information¶

Course Overview¶

Overview: Segmentation¶

Overview: Analysis¶

Overview: Big Imaging¶

Overview: Wrapping Up¶

Images¶

An introduction to images¶

What is an image?¶

Image sampling¶

Let's create a small image¶

2D Intensity Images¶

Lookup Tables¶

Matplotlib's color maps¶

Applied LUTs¶

3D Images¶

Multiple Values¶

Hyperspectral Imaging¶

Image Formation¶

Where do images come from?¶

Acquiring Images¶

Traditional / Direct imaging¶

Indirect / Computational imaging¶

Traditional Imaging¶

Traditional Imaging: Model¶

Indirect Imaging (Computational Imaging)¶

Image Analysis¶

Image Analysis: Experimentalist¶

Characteristics¶

Examples¶

Image Analysis: Computer Vision Approaches¶

Characteristics¶

Examples¶

Image Analysis: Deep Learning Approach¶

Characteristics¶

Examples¶

On Science¶

What is the purpose?¶

How?¶