Multiparametric MRI - supervised (KNN) tissue classification¶

BMED360-2021 04-mri-knn-tissue-classification.ipynb

Learning objectives and context¶

In this session you will learn to predict predefined tissue types in a given multispectral MR image using machine learning (supervised classification)
The supervised classification model we wil use is simple K-nearest neighbor classification model, denoted $f$ (described below)
To perform such pixel-wise tissue classification we will make use of the training (i.e. the training mask) we obtained during the labelling of data (see figure below of color-coded tissue samples)
In the previous data labeling step (not part of these notebooks) we defined six different classes (tissue types), denoted $\mathbf y$ and the corresponding four channel multispectral MRI data, dentoted $\mathbf X$ .
The notebook is thus a practical machine learning example of the formalism: $y \approx f(\mathbf X, \theta)$
You will also learn to navigate and appreciate the distinction between image space (pixel locations, spatial neiborhoods) and feature vector space (signal intensity value combinations, and similarity of pixel-based and tissue-based "signatures").

For using Colab¶

--> the following libraries must be pip installed (i.e. uncommet the following pip and wget commands):

In [1]:

#!pip install gdown

In [2]:

#!pip install pydicom

In [3]:

#!pip install simpleitk

In [4]:

#!pip install nilearn

In [5]:

#!wget -O utils.py https://raw.githubusercontent.com/computational-medicine/BMED360-2021/master/Lab2-ML-tissue-classification/utils.py

In [6]:

import gdown
import shutil
import os
import sys

In [7]:

# Download zip-file if ./assets does not exist (as when running in Colab)

if os.path.isdir('./assets') == False:
    
    ## Download assets.zip for Google Drive
    
    #https://drive.google.com/file/d/1gl7rbwkmSS6hVwEXUZuAPwEe28OwCH31/view?usp=sharing
    file_id = '1gl7rbwkmSS6hVwEXUZuAPwEe28OwCH31'
    url = 'https://drive.google.com/uc?id=%s' % file_id
    output = './assets.zip'
    gdown.download(url, output, quiet=False)
    
    ## Unzip the assets file into `./assets`
    shutil.unpack_archive(output, '.')
    
    ## Delete the `assets.zip` file
    os.remove(output)
else:
    print(f'./assets  exists already!')

./assets  exists already!

In [8]:

# Make a results directory if not exsits

r_pth = './results'
if os.path.isdir(r_pth) == False:
    try:
        os.mkdir(r_pth)
    except OSError:
        print("Creation of the directory %s failed" % r_pth)
    else:
        print("Successfully created the directory %s " % r_pth)
else:
    print("Directory %s already exists!" % r_pth)

Directory ./results already exists!

In [9]:

# HOME = os.path.expanduser('~')  # To make path to local home directory

Theory: K-nearest neighbor classication (KNN)¶

Short description and explanation of KNN-classification:

In pattern recognition, the $K$ -nearest neighbors algorithm (K-NN) is a non-parametric method used for classification and regression. In both cases, the input consists of the $K$ closest training samples in the feature space. The output depends on whether K-NN is used for classification or regression:

In K-NN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its $K$ nearest neighbors ( $K$ is a positive integer, typically small and odd,

e.g. $K \in \{1,3,5,7,9,11\}$ , say).

If $K = 1$ , then the object is simply assigned to the class of that single nearest neighbor, i.e. nearest neighbor classification.
In K-NN regression, the output is the property value for the object. This value is the average of the values of $K$ nearest neighbors.

In [10]:

from IPython.display import Image
Image(filename='./assets/knn_illustration.png', width=500)

Out[10]:

Instance of a K-NN classification

The test sample (the filled green circle), with unkown class belonging, shall be classified to either the class marked as blue squares or the class marked as red triangles, i.e. a two-class problem using K-NN.
If $K = 3$ (circle drawn with black continuous line) the unkown observation will be assigned the red triangle class since there are two red-label samples and only one blue-label sample in the set consisting of the three closest neighbors being labeled (using the Eucledean distance metric).
If, on the other hand, $K = 5$ (circle drawn with black dashed line) the previously unseen (i.e. not labeled) sample will be assigned the blue square class (three blue squares vs. two red triangles among the five closest samples being labeled), i.e. "majority voting".
K-NN is a lazy learner because it doesn't learn a discriminative function from the training data but "memorizes" the training dataset instead. For example, the logistic regression algorithm learns its model weights (parameters, $\theta$ ) during training time.
For more elaborative explanation, see the Wikipedia entry for the k-nearest neighbors algorithm.

Four-channel multispectral image and associated training masks¶

We will be using a four-channel multispectral image (slice 60 from a multispectral 3D recording),
reported in Lundervold et al. Volume distribution of cerebrospinal fluid using multispectral MR imaging.
Medical Image Analysis 2000;4:123-136. https://www.ncbi.nlm.nih.gov/pubmed/10972326, [PDF]
and a manually delineated training mask (cf. "labeling of data") consisting of 6 tissue types (color coded in [R,G,B] space) as follows:

AIR (air/compact bone) in Magenta [255,0,255]
GM (gray matter) in Red [255,0,0]
WM (white matter) in Cyan [0,255,255]
CSF (cerebrospinal fluid) in Blue [0,0,255]
MUS (muscle) in Green [0,255,0]
FAT (fat) in Yellow [255,255,0]

and a manually delineated head ROI mask for spatial restriction of the supervised pixel classification.

Chek out:
Multispectral MRI analysis started with the seminal work of Vannier et al., Radiology 1985 - inspired by the research (on remote sensing) at NASA!

In [11]:

from IPython.display import Image
Image(filename='./assets/multispectral_tissue_classification_pptx.png', width=600)

Out[11]:

Set up our Python environment¶

In [12]:

%matplotlib inline
# This to be able to display figures and graphs within the notebook browser

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
import numpy as np
import pandas as pd
import nibabel as nib
import imageio
import pydicom as dicom
from sklearn.cluster import KMeans
from nilearn.masking import apply_mask
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix 

In [13]:

import sklearn
print('scikit-learn:', sklearn.__version__)

scikit-learn: 0.24.1

MRI pulse sequenses (features) and tissue types (classes)¶

Define channel names (MRI pulse sequence acronyms) and class labels (tissue types) that are used

In [14]:

mydata = './assets'
chn_names = ['FLASH', 'DESS', 'FISP', 'PSIF']
class_names = ['air', 'gm', 'wm', 'csf', 'mus', 'fat']

Construct a multispectral image¶

Use pydicom for loading series of 2D DICOM files

In [15]:

dcms = []  # for making a sequence of channel images
for fn in chn_names:
    pth = '%s/%s_060.dcm' % (mydata, fn.lower())
    dcms.append(dicom.dcmread(pth))
    
print(f"{len(dcms)}",end=' dicom images loaded')

4 dicom images loaded

Let's load data (pixel values) from DICOM structures into a 3D numpy matrix and prepare for a NIFTI header

In [16]:

# Get a shape of every single 2D image e.g. from the first image in the list
rows, cols = dcms[0].pixel_array.shape

# Get the number of DICOM files (images in the whole volume)
nchn = len(dcms)

# Get slice thickness
thick = round(float(dcms[0].SliceThickness), 2)

# Get slice pixel spacing
pix_spacing = tuple([float(item) for item in dcms[0].PixelSpacing])

# Get the pixel type of an image
pixelType = dcms[0].pixel_array.dtype

# Create an empty 3D matrix with exact shape (rows, cols, slices) and pixel type 
#  (possible pixel types: int or float; 8, 16, 32 or 64 bits)
img_data = np.zeros((rows, cols, nchn), dtype=pixelType)

# Fill in 3D array with the data (pixel values) from loaded DICOM structres, 
#  reverse rows in 2D pixel array to keep correct orientation (trial and error)
for i, s in enumerate(dcms):
    img_data[:, :, i] = s.pixel_array[::-1,:] # np.transpose(s.pixel_array[::-1,:])

In [17]:

print(f'shape:{img_data.shape}; pixel type:{pixelType} {type(img_data)}; \
pixel spacing:{pix_spacing}; slice thickness: {thick} mm')

shape:(256, 256, 4); pixel type:int16 <class 'numpy.ndarray'>; pixel spacing:(1.0, 1.0); slice thickness: 1.4 mm

Here we construct a new empty NIFTI header

In [18]:

hdr = nib.Nifti1Header()

In [19]:

# Set zooms as a tuple of length 4, adding tuples to pix_spacing tuple
hdr.set_zooms = (pix_spacing + (thick,)) + (1.0,)
hdr.set_zooms 

Out[19]:

(1.0, 1.0, 1.4, 1.0)

In [20]:

img = nib.Nifti1Image(img_data, affine=np.eye(4))

In [21]:

img.affine[0,0] = list(hdr.set_zooms)[0]
img.affine[1,1] = list(hdr.set_zooms)[1]
img.affine[2,2] = list(hdr.set_zooms)[2]
img.affine

Out[21]:

array([[1. , 0. , 0. , 0. ],
       [0. , 1. , 0. , 0. ],
       [0. , 0. , 1.4, 0. ],
       [0. , 0. , 0. , 1. ]])

In [22]:

img = nib.Nifti1Image(img_data, img.affine)

In [23]:

img.header.set_datatype = pixelType

In [24]:

print(img.header)

<class 'nibabel.nifti1.Nifti1Header'> object, endian='<'
sizeof_hdr      : 348
data_type       : b''
db_name         : b''
extents         : 0
session_error   : 0
regular         : b''
dim_info        : 0
dim             : [  3 256 256   4   1   1   1   1]
intent_p1       : 0.0
intent_p2       : 0.0
intent_p3       : 0.0
intent_code     : none
datatype        : int16
bitpix          : 16
slice_start     : 0
pixdim          : [1.  1.  1.  1.4 1.  1.  1.  1. ]
vox_offset      : 0.0
scl_slope       : nan
scl_inter       : nan
slice_end       : 0
slice_code      : unknown
xyzt_units      : 0
cal_max         : 0.0
cal_min         : 0.0
slice_duration  : 0.0
toffset         : 0.0
glmax           : 0
glmin           : 0
descrip         : b''
aux_file        : b''
qform_code      : unknown
sform_code      : aligned
quatern_b       : 0.0
quatern_c       : 0.0
quatern_d       : 0.0
qoffset_x       : 0.0
qoffset_y       : 0.0
qoffset_z       : 0.0
srow_x          : [1. 0. 0. 0.]
srow_y          : [0. 1. 0. 0.]
srow_z          : [0.  0.  1.4 0. ]
intent_name     : b''
magic           : b'n+1'

In [25]:

nib.save(img, './results/04_multispectral_060.nii.gz')

Load a multispectral MR image, training masks and ROI mask¶

Using nibabel to read the multispectral image, and imageio to read the training mask and brain ROI mask

In [26]:

fn_multispectral = './results/04_multispectral_060.nii.gz'
img = nib.load(fn_multispectral)

In [27]:

fn_tmsk = './assets/flash_060_training_mask_6cla.png'
tm = imageio.imread(fn_tmsk)
tmsk = np.array(tm[::-1,:,:], dtype=np.uint8)    # tm[::-1,:,:] to reverse up-side-down training mask
tmsk_img = nib.Nifti1Image(tmsk, img.affine)
nib.save(tmsk_img, './results/04_flash_060_training_mask_6cla.nii.gz')

In [28]:

fn_roimsk = './assets/flash_060_brain_mask.png'
rm = imageio.imread(fn_roimsk)
roimsk = np.array(rm[::-1,:,:], dtype=np.uint8)  # rm[::-1,:,:] to reverse up-side-down ROI mask
roimsk_img = nib.Nifti1Image(roimsk, img.affine)
nib.save(roimsk_img, './results/04_flash_060_brain_mask.nii.gz')

In [29]:

img.header.get_data_dtype()

Out[29]:

dtype('<i2')

Display six-class training mask and ROI brain mask

In [30]:

fig, axes = plt.subplots(1,2, figsize=(14,7))
ax = axes.ravel()
ax[0].imshow(tmsk, origin='lower')
ax[0].axis('on')
ax[0].set_title("Six-class training mask", size=16)
ax[0].set_xticks([]), ax[0].set_yticks([])

ax[1].imshow(roimsk, origin='lower')
ax[1].axis('on')
ax[1].set_title("ROI brain mask", size=16)
ax[1].set_xticks([]), ax[1].set_yticks([])
plt.show()

Show properties of the multispectral NIFTI image¶

In [31]:

print('**Multispectral image info:**')
print('shape of image = ', img.header.get_data_shape())
print('units = ', img.header.get_xyzt_units())
print('voxel size = ', img.header.get_zooms())
print('dtype = %s' % img.header.get_data_dtype())
if img.header.get_data_dtype() == 'int16':   # Collaps the singular (z-)dimension
    dat = np.int16(img.get_fdata().squeeze())  
print('min = %.1f' % dat.min())
print('max = %.1f' % dat.max())
print('number of channels =', img.shape[-1])
print('shape of 2D+spectral img_data = ',dat.shape)
print('dtype of 2D+spectral img_data = ',dat.dtype)
print('img affine:\n', img.affine)

**Multispectral image info:**
shape of image =  (256, 256, 4)
units =  ('unknown', 'unknown')
voxel size =  (1.0, 1.0, 1.4)
dtype = int16
min = 0.0
max = 493.0
number of channels = 4
shape of 2D+spectral img_data =  (256, 256, 4)
dtype of 2D+spectral img_data =  int16
img affine:
 [[1.         0.         0.         0.        ]
 [0.         1.         0.         0.        ]
 [0.         0.         1.39999998 0.        ]
 [0.         0.         0.         1.        ]]

and the corresponding training mask¶

In [32]:

print('**Training mask info:**')
print('shape = ', tmsk_img.header.get_data_shape())
print('voxel size = ', tmsk_img.header.get_zooms())
print('dtype tmsk = %s' % tmsk_img.header.get_data_dtype())
if tmsk_img.header.get_data_dtype() == 'uint8':   # Collaps the singular (z-)dimension
    tmsk_data = np.uint8(tmsk_img.get_fdata().squeeze())  
print('min mask value = %.0f' % tmsk_data.min())
print('max mask value = %.0f' % tmsk_data.max())
print('shape of 2D tmsk_data = ', tmsk_data[:,:,0].shape)
print('dtype of 2D tmsk_data = ', tmsk_data.dtype)
print('tmsk affine:', tmsk_img.affine)

**Training mask info:**
shape =  (256, 256, 3)
voxel size =  (1.0, 1.0, 1.4)
dtype tmsk = uint8
min mask value = 0
max mask value = 255
shape of 2D tmsk_data =  (256, 256)
dtype of 2D tmsk_data =  uint8
tmsk affine: [[1.         0.         0.         0.        ]
 [0.         1.         0.         0.        ]
 [0.         0.         1.39999998 0.        ]
 [0.         0.         0.         1.        ]]

and the ROI (head mask)¶

In [33]:

print('**Brain ROI mask info:**')
print('shape = ', roimsk_img.header.get_data_shape())
print('voxel size = ', roimsk_img.header.get_zooms())
print('dtype roimsk = %s' % roimsk_img.header.get_data_dtype())
if roimsk_img.header.get_data_dtype() == 'uint8':   # Collaps the singular (z-)dimension
    roimsk_data = np.uint8(roimsk_img.get_fdata().squeeze())  
print('min mask value = %.0f' % roimsk_data.min())
print('max mask value = %.0f' % roimsk_data.max())
print('shape of 2D roimsk_data = ', roimsk_data[:,:,0].shape)
print('dtype of 2D roimsk_data = ', roimsk_data.dtype)
print('roimsk affine:\n', roimsk_img.affine)

**Brain ROI mask info:**
shape =  (256, 256, 3)
voxel size =  (1.0, 1.0, 1.4)
dtype roimsk = uint8
min mask value = 0
max mask value = 255
shape of 2D roimsk_data =  (256, 256)
dtype of 2D roimsk_data =  uint8
roimsk affine:
 [[1.         0.         0.         0.        ]
 [0.         1.         0.         0.        ]
 [0.         0.         1.39999998 0.        ]
 [0.         0.         0.         1.        ]]

Display the multispectral image as separate channel images¶

In [34]:

fig, axes = plt.subplots(2,2, figsize=(14,14))
ax = axes.ravel()
for k, ch in enumerate(chn_names):
    ax[k].imshow(dat[:, :, k], cmap='gray', origin='lower')
    ax[k].set_title(ch)
    ax[k].set(xlabel="")
    ax[k].axis('off')
plt.suptitle("The multispectral MRI slice (data, $\mathbf{X} \subset \mathbf{R}^4$)", size=16) 
plt.tight_layout
plt.show()

Show the training mask with color-coded class (tissue type) labels¶

(according to a pre-defined color lookup table, or LUT )

We use a dictionary for the color LUT:

In [35]:

import matplotlib
col_code = {
    'BCK': [255,255,255], # White (background)
    'AIR': [255,0,255],   # Magenta 
    'GM':  [255,0,0],     # Red 
    'WM':  [0,255,255],   # Cyan 
    'CSF': [0,0,255],     # Blue 
    'MUS': [0,255,0],     # Green 
    'FAT': [255,255,0],   # Yellow 
    'ERR': [0,0,0]        # Black (error in labeling, should be BCK)
}
cla_names = list(col_code.keys())
ncla = len(cla_names)
colors = np.array(list(col_code.values()))/255   # scale to interval 0-1
mycmap = matplotlib.colors.ListedColormap(colors)
cla_cmap = matplotlib.cm.get_cmap(mycmap, ncla)   # ncla discrete colors

In [36]:

tmask = np.zeros(tmsk_data[:,:,0].shape, dtype=np.uint8)
R = tmsk_data[:,:,0].squeeze()
G = tmsk_data[:,:,1].squeeze()
B = tmsk_data[:,:,2].squeeze()
for idx, (key, val) in enumerate(col_code.items()):
    mm = np.where((R == val[0]) & (G == val[1]) & (B == val[2]))
    tmask[mm[0],mm[1]] = (idx+1)  #  mm[0]: rows, mm[1]: cols
    print(f'{idx}-{key}: {len(mm[0])} pixels (RGB={val})')
print(f'\ntmask.dtype: {tmask.dtype}')

0-BCK: 63519 pixels (RGB=[255, 255, 255])
1-AIR: 1250 pixels (RGB=[255, 0, 255])
2-GM: 136 pixels (RGB=[255, 0, 0])
3-WM: 129 pixels (RGB=[0, 255, 255])
4-CSF: 51 pixels (RGB=[0, 0, 255])
5-MUS: 250 pixels (RGB=[0, 255, 0])
6-FAT: 170 pixels (RGB=[255, 255, 0])
7-ERR: 31 pixels (RGB=[0, 0, 0])

tmask.dtype: uint8

In [37]:

u, c = np.unique(tmask, return_counts=True)
print(f'Unique values: {list(u)} \nNumber of occurences: {list(c)}')

Unique values: [1, 2, 3, 4, 5, 6, 7, 8] 
Number of occurences: [63519, 1250, 136, 129, 51, 250, 170, 31]

Construct a list of class names and corresponding class numbers, class-mask values, and class cardinality

In [38]:

cla_names_num = []
for i in range(ncla):
    str = cla_names[i] + ' [%d]: val=%d, card=%d' % (i, u[i], c[i])
    cla_names_num.append(str)
cla_names_num[:]

Out[38]:

['BCK [0]: val=1, card=63519',
 'AIR [1]: val=2, card=1250',
 'GM [2]: val=3, card=136',
 'WM [3]: val=4, card=129',
 'CSF [4]: val=5, card=51',
 'MUS [5]: val=6, card=250',
 'FAT [6]: val=7, card=170',
 'ERR [7]: val=8, card=31']

Make a figure showing the color-coded labeled pixels (training mask)¶

In [39]:

fig, ax = plt.subplots(figsize=(10,12))

cmsk = ax.imshow(tmsk_data[:, :])
clim=cmsk.properties()['clim']
cax = ax.imshow(tmsk_data[:, :], cmap=cla_cmap, origin='lower', clim=clim)
ax.set_title('Color-coded training mask\n')
ax.axis('on')
cbar = fig.colorbar(cax, shrink=0.4, label='Tissue classed [0-%d] and their cardinality' % (ncla-1))
tick_locs = np.linspace(ncla*2+clim[0]+0.5, ((255-2*ncla)/255)*clim[1]-0.5, ncla)
cbar.set_ticks(tick_locs)
cbar.ax.set_yticklabels(cla_names_num)
plt.tight_layout
plt.show()


fig.savefig('./results/04_training_mask_1_6_color_coded.png', transparent=False, dpi=300, bbox_inches="tight")


#fig.savefig('%s/prj/MMIV-DLN-AI-2021/results/training_mask_1_6_color_coded.png' % (HOME),
#            transparent=False, dpi=300, bbox_inches="tight")

Correcting training mask (cf. black pixels on the boundary of the CSF mask being mapped to unlabeled background, BCK)¶

In [40]:

c_tmsk_data = tmsk_data.copy()
mm = np.where((R == 0) & (G == 0) & (B == 0)) # Black is error in labeling, map to BCK = white, or possibly to CSF)
c_tmsk_data[mm[0],mm[1], 0] = 255  #  mm[0]: rows, mm[1]: cols  should be 1 (=BCK)
c_tmsk_data[mm[0],mm[1], 1] = 255  #  mm[0]: rows, mm[1]: cols  should be 1 (=BCK)
c_tmsk_data[mm[0],mm[1], 2] = 255  #  mm[0]: rows, mm[1]: cols  should be 1 (=BCK)

Remove last element from col_code dictionary

In [41]:

c_col_code = col_code.copy()     # Deep copy, will not affect the original col_code
err_col = c_col_code.popitem()
c_col_code

Out[41]:

{'BCK': [255, 255, 255],
 'AIR': [255, 0, 255],
 'GM': [255, 0, 0],
 'WM': [0, 255, 255],
 'CSF': [0, 0, 255],
 'MUS': [0, 255, 0],
 'FAT': [255, 255, 0]}

In [42]:

c_tmask = np.zeros(c_tmsk_data[:,:,0].shape, dtype=np.uint8)
R = c_tmsk_data[:,:,0].squeeze()
G = c_tmsk_data[:,:,1].squeeze()
B = c_tmsk_data[:,:,2].squeeze()
for idx, (key, val) in enumerate(c_col_code.items()):
    mm = np.where((R == val[0]) & (G == val[1]) & (B == val[2]))
    c_tmask[mm[0],mm[1]] = (idx+1)  #  mm[0]: rows, mm[1]: cols
    print(f'{idx}-{key}: {len(mm[0])} pixels (RGB={val})')

0-BCK: 63550 pixels (RGB=[255, 255, 255])
1-AIR: 1250 pixels (RGB=[255, 0, 255])
2-GM: 136 pixels (RGB=[255, 0, 0])
3-WM: 129 pixels (RGB=[0, 255, 255])
4-CSF: 51 pixels (RGB=[0, 0, 255])
5-MUS: 250 pixels (RGB=[0, 255, 0])
6-FAT: 170 pixels (RGB=[255, 255, 0])

Update c_cla_map accordingly

In [43]:

c_cla_names = list(c_col_code.keys())
c_ncla = len(c_cla_names)
c_colors = np.array(list(c_col_code.values()))/255   # scale to interval 0-1
c_mycmap = matplotlib.colors.ListedColormap(c_colors)
c_cla_cmap = matplotlib.cm.get_cmap(c_mycmap, c_ncla)   # ncla discrete colors

In [44]:

u, c = np.unique(c_tmask, return_counts=True)
print(f'Unique values: {list(u)} \nNumber of occurences: {list(c)}')

Unique values: [1, 2, 3, 4, 5, 6, 7] 
Number of occurences: [63550, 1250, 136, 129, 51, 250, 170]

In [45]:

c_cla_names_num = []
for i in range(c_ncla):
    str = c_cla_names[i] + ' [%d]: val=%d, card=%d' % (i, u[i], c[i])
    c_cla_names_num.append(str)
c_cla_names_num[:]

Out[45]:

['BCK [0]: val=1, card=63550',
 'AIR [1]: val=2, card=1250',
 'GM [2]: val=3, card=136',
 'WM [3]: val=4, card=129',
 'CSF [4]: val=5, card=51',
 'MUS [5]: val=6, card=250',
 'FAT [6]: val=7, card=170']

In [46]:

fig, ax = plt.subplots(figsize=(10,12))

cmsk = ax.imshow(c_tmsk_data[:, :])
clim=cmsk.properties()['clim']
cax = ax.imshow(c_tmsk_data[:, :], cmap=c_cla_cmap, origin='lower', clim=clim)
ax.set_title('Color-coded corrected training mask\n', size=16)
ax.axis('on')
ax.set_yticklabels("")
ax.set_xticklabels("")
cbar = fig.colorbar(cax, shrink=0.4, label='Tissue classed [0-%d] and their cardinality' % (c_ncla-1))
tick_locs = np.linspace(c_ncla*2+clim[0]+0.5, ((255-2*c_ncla)/255)*clim[1]-0.5, c_ncla)
cbar.set_ticks(tick_locs)
cbar.ax.set_yticklabels(c_cla_names_num)
plt.tight_layout
plt.show()

fig.savefig('./results/04_corrected_training_mask_1_6_color_coded.png', transparent=False, dpi=300, bbox_inches="tight")

We make a data frame (FVB) from multispectral MRI data and training mask¶

FVB = Feature Vector Base; the training mask is here a multiclass training mask

Using np.where() to find locations to those pixels containing tissue being labelled
Thereafter extract the corresponding data (feature vectors) from the multispectral image being spatially aligned with the training mask:

In [47]:

# Find pixel locations corresponding to AIR (class 1), GM (class 2), ..., FAT (class 6) and get the MRI data from dat
frames = pd.DataFrame()   # Create an empty data frame
for cn, cla in enumerate(class_names):
    ind = np.where(c_tmask == cn+2)  # Find indices (x,y) for given class, class numbers start at 2 (BCK is 0)
    df = pd.DataFrame(np.asarray(dat[ind[0][:],ind[1][:],:]), columns = chn_names)
    df.insert(len(df.columns), 'Class', class_names[cn].upper())  # Last entry is class name
    frames = frames.append(df)
    
# Concatinate the frames
FVB = pd.concat([frames], ignore_index=True)

Check some training data: $(x_1^i, x_2^i,x_3^i, x_4^i, y^i)$
for $i=0,1,\ldots,4$ and for $i=N-4, N-3,\ldots,N$ , where $N$ is the total number of pixels being labelled

In [48]:

FVB.head()

Out[48]:

	FLASH	DESS	FISP	PSIF	Class
0	5	15	13	8	AIR
1	13	16	8	10	AIR
2	11	14	3	10	AIR
3	3	12	4	7	AIR
4	9	10	6	2	AIR

In [49]:

FVB.tail()

Out[49]:

	FLASH	DESS	FISP	PSIF	Class
1981	245	64	109	152	FAT
1982	267	67	124	182	FAT
1983	281	65	145	176	FAT
1984	267	56	109	166	FAT
1985	287	53	148	214	FAT

Class-specific summary statistics from the FVB accross the different features (channels, or pulse sequences)

In [50]:

FVB.groupby('Class').describe(percentiles = [0.5]).round(3).T

Out[50]:

	Class	AIR	CSF	FAT	GM	MUS	WM
FLASH	count	1250.000	51.000	170.000	136.000	250.000	129.000
	mean	6.054	29.196	264.435	112.728	101.336	162.721
	std	3.467	6.672	40.778	11.184	8.806	6.870
	min	0.000	13.000	140.000	91.000	79.000	132.000
	50%	6.000	30.000	271.000	112.000	103.000	163.000
	max	19.000	46.000	338.000	152.000	123.000	178.000
DESS	count	1250.000	51.000	170.000	136.000	250.000	129.000
	mean	6.925	129.294	45.176	127.426	50.768	117.876
	std	4.885	70.856	9.184	19.702	7.178	6.721
	min	0.000	42.000	24.000	100.000	28.000	101.000
	50%	6.000	110.000	45.000	124.500	51.000	118.000
	max	26.000	258.000	71.000	217.000	70.000	137.000
FISP	count	1250.000	51.000	170.000	136.000	250.000	129.000
	mean	6.551	71.039	133.688	171.596	103.372	187.752
	std	4.468	12.167	25.817	16.722	21.548	8.113
	min	0.000	50.000	72.000	127.000	34.000	174.000
	50%	5.000	70.000	135.000	172.000	102.000	187.000
	max	26.000	99.000	185.000	221.000	144.000	209.000
PSIF	count	1250.000	51.000	170.000	136.000	250.000	129.000
	mean	6.005	285.412	152.312	154.750	45.328	112.364
	std	3.445	104.923	28.726	68.427	12.104	18.024
	min	0.000	104.000	68.000	103.000	19.000	66.000
	50%	6.000	324.000	154.000	126.000	45.000	113.000
	max	19.000	418.000	214.000	439.000	78.000	168.000

Save dataframe FVB from the labeled NIFTI data as a .csv file¶

In [51]:

FVB.to_csv('./results/04_multispectral_mri_training_data_from_manual_png_mask.csv', index=False)

Supervised classification with the KNN algorithm¶

Read the training data (`X` and `y`)¶

In [52]:

FVB = pd.read_csv('./results/04_multispectral_mri_training_data_from_manual_png_mask.csv')
FVB.head()

Out[52]:

	FLASH	DESS	FISP	PSIF	Class
0	5	15	13	8	AIR
1	13	16	8	10	AIR
2	11	14	3	10	AIR
3	3	12	4	7	AIR
4	9	10	6	2	AIR

Separate the data matrix `X_train` from the corresponding vector of labels `y_train` in the FVB dataframe¶

In [53]:

X_train = FVB.iloc[:, :-1].values.astype(float)  # The data X
y_train = FVB.iloc[:, -1].values                 # Last column is 'Class'

In [54]:

X_train.shape

Out[54]:

(1986, 4)

Load the test data set `X_test` defined by the head `ROI mask`¶

For being able to map classified pixels (i.e. pixel-based feature vectors in signal intensity space $\subset \mathbf R^4$ ) back into image space $\subset \mathbf N^+ \times \mathbf N^+$ we will need to store their row-column $(i, j)$ locations - not only the feature vectors.

In [55]:

# fn_roimsk = './data/mri/brain_roi_mask.nii.gz'
fn_roimsk = './results/04_flash_060_brain_mask.nii.gz'
roimsk_img = nib.load(fn_roimsk)
if roimsk_img.header.get_data_dtype() == 'uint8': 
    roimsk_data = np.uint8(roimsk_img.get_fdata())
    
print(roimsk_data.shape)
print(roimsk_data.min())
print(roimsk_data.max())
# For using the whole image as ROI, uncomment
# roimsk_data = np.ones(roimsk_data.shape)

(256, 256, 3)
0
255

In [56]:

plt.imshow(roimsk_data[:,:,:], origin='lower')
plt.show()

In [57]:

fn_multispectral = './results/04_multispectral_060.nii.gz'
img = nib.load(fn_multispectral)
if img.header.get_data_dtype() == 'int16':   # Collaps the singular (z-)dimension
    data = np.int16(img.get_fdata())

In [58]:

fig, axes = plt.subplots(2,2, figsize=(10,10))
ax = axes.ravel()
for k, ch in enumerate(chn_names):
    ax[k].imshow(data[:,:,k].squeeze() + roimsk_data[:,:,1].squeeze(), cmap='gray', origin='lower')
    ax[k].set_title('%d-%s' % (k+1, ch))
    ax[k].set(xlabel="")
    ax[k].axis('off')
plt.show()

Find all pixel locations in the ROI to extract the test data matrix, `X_test`¶

and save the test data samples and their corresponding pixel locations as a Pandas dataframe

In [59]:

# Find pixel locations corresponding to brain ROI (value red =[255, 0, 0]) 
ind_test = np.where((roimsk_data[:,:,0] == 255) & (roimsk_data[:,:,1] == 0) & (roimsk_data[:,:,2] == 0))
X_test = np.asarray(data[ind_test[0][:],ind_test[1][:],:]) # The multispectral signal intensities

roimask = np.zeros(roimsk_data[:,:,0].shape)
roimask[ind_test[0][:], ind_test[1][:]] = 1 # The ROI brain mask values

dfTest = pd.DataFrame(X_test, columns = chn_names)
dfTest.insert(loc = len(dfTest.columns),
           column = 'row',
           value = ind_test[0])  # Row of pixel location
dfTest.insert(loc = len(dfTest.columns),
           column = 'col',
           value = ind_test[1])  # Col of pixel location

In [60]:

plt.imshow(roimask, cmap='gray', origin='lower')
plt.show()
print(roimask.max(), roimask.shape)

1.0 (256, 256)

In [61]:

dfTest.head()

Out[61]:

	FLASH	DESS	FISP	PSIF	row	col
0	56	28	22	59	42	143
1	49	20	34	47	42	144
2	32	17	63	35	42	145
3	36	28	86	33	42	146
4	61	44	90	43	42	147

In [62]:

dfTest.tail()

Out[62]:

	FLASH	DESS	FISP	PSIF	row	col
35817	12	8	6	15	240	140
35818	13	5	12	14	240	141
35819	17	2	14	7	240	142
35820	14	4	13	6	240	143
35821	6	2	12	5	240	144

Scaling of feature vectors (MRI signal intensities) in `X_train` and `X_test`¶

$\rightarrow$ Your turn¶

Why should we consider feature scaling (normalization) in machine learning?¶

(Hint: See https://en.wikipedia.org/wiki/Feature_scaling)

We will use the StandardScaler from scikit-learn

In [63]:

scaler = StandardScaler()  
scaler.fit(X_train.astype(float))

X_train_scaled = scaler.transform(X_train.astype(float))  
X_test_scaled = scaler.transform(X_test.astype(float)) 

We make Pandas dataframes from the pair of scaled X_train (dX) and corresponding y_train (dy) to confirm the effect of feature scaling, i.e.

In [64]:

dX = pd.DataFrame(X_train_scaled, columns=chn_names)
dy = pd.DataFrame(y_train, columns=['Class'])
FVB_train = pd.concat([dX, dy], axis=1)
FVB_train.describe(percentiles = [0.5]).round(4).T

Out[64]:

	count	mean	std	min	50%	max
FLASH	1986.0	0.0	1.0003	-0.7119	-0.6019	3.4196
DESS	1986.0	-0.0	1.0003	-0.7713	-0.5691	5.0273
FISP	1986.0	0.0	1.0003	-0.8068	-0.6583	2.4738
PSIF	1986.0	0.0	1.0003	-0.6589	-0.5347	5.3990

this demonstrates that we have obtained zero mean and unit variance for each feature in the training data after scaling

Estimate (fit) a KNN classifier given the training data set¶

for (non-scaled) training data

We use KNeighborsClassifier from scikit-learn

We try K=1¶

In [65]:

K = 1
classifier = KNeighborsClassifier(n_neighbors=K)  
classifier.fit(X_train, y_train)

Out[65]:

KNeighborsClassifier(n_neighbors=1)

Prediction (y_train_pred) on the training dataset, (X_train)¶

In our case, we have not labeled all the pixels in the ROI (that will take a lot of time and effort, and also be prone to intra and inter observer variation).
Thus, we do not have the "ground truth" i.e. y_test (only y_train)
We have to rely on plausible results (not quantitative performance metrices), regarding the test dataset (see later).
Initially, we examine the performance of our classifier on the training dataset.
Usually, the classifier should perform very well on training data (by the very construction of the classifier), and we will examine this situation.
We obtain perfect classification! (as seen below)

In [66]:

y_train_pred = classifier.predict(X_train) 

Performance evaluation on the traing dataset using the `confusion matrix`¶

In [67]:

print(confusion_matrix(y_train, y_train_pred))  

[[1250    0    0    0    0    0]
 [   0   51    0    0    0    0]
 [   0    0  170    0    0    0]
 [   0    0    0  136    0    0]
 [   0    0    0    0  250    0]
 [   0    0    0    0    0  129]]

$\rightarrow$ Your turn!¶

Why are we guaranteed the confusion matrix being diagonal in this case?¶
What assumption do we make to give such a guarantee?¶

Evaluation on the training dataset according to a `classification_report`¶

Check the documentation in classification_report from the scikit-learn library

In [68]:

print(classification_report(y_train, y_train_pred)) 

              precision    recall  f1-score   support

         AIR       1.00      1.00      1.00      1250
         CSF       1.00      1.00      1.00        51
         FAT       1.00      1.00      1.00       170
          GM       1.00      1.00      1.00       136
         MUS       1.00      1.00      1.00       250
          WM       1.00      1.00      1.00       129

    accuracy                           1.00      1986
   macro avg       1.00      1.00      1.00      1986
weighted avg       1.00      1.00      1.00      1986

No, we try K >> 1¶

In [69]:

K = 51
classifier = KNeighborsClassifier(n_neighbors=K)  
classifier.fit(X_train, y_train)
y_train_pred = classifier.predict(X_train) 

Performance evaluation¶

In [70]:

print(confusion_matrix(y_train, y_train_pred))  

[[1250    0    0    0    0    0]
 [   0   38    0    0   13    0]
 [   0    0  170    0    0    0]
 [   0    4    0  132    0    0]
 [   0    0    0    0  250    0]
 [   0    0    0    1    0  128]]

In [71]:

 print(classification_report(y_train, y_train_pred))

              precision    recall  f1-score   support

         AIR       1.00      1.00      1.00      1250
         CSF       0.90      0.75      0.82        51
         FAT       1.00      1.00      1.00       170
          GM       0.99      0.97      0.98       136
         MUS       0.95      1.00      0.97       250
          WM       1.00      0.99      1.00       129

    accuracy                           0.99      1986
   macro avg       0.97      0.95      0.96      1986
weighted avg       0.99      0.99      0.99      1986

$\rightarrow$ Your turn!¶

Do experimentes on the training dataset with K-NN for different values of K, e.g. $K \in \{3, 5, 27, 51\}$ and check the corresponding confusion matrix.¶
Why do we frequently get non-diagonal confusion matrices in this case?¶

Prediction on the scaled `X_test` dataset¶

(delineated by the head ROI mask)¶

Initially we let K=5 and use feature scaling "zero mean unit variance" across all classes

In [72]:

K = 5
classifier = KNeighborsClassifier(n_neighbors=K)
classifier.fit(X_train_scaled, y_train)
# classifier.fit(X_train, y_train)

Out[72]:

KNeighborsClassifier()

Prediction on the scaled X_test_scaled dataset (delineated by the head ROI mask)

In [73]:

y_pred = classifier.predict(X_test_scaled)
# y_pred = classifier.predict(X_test)

Prediction results¶

In [74]:

print('Number of classified pixels:', len(y_pred))
print('\nThe first 10 predictions in ROI:', y_pred[:10])

Number of classified pixels: 35822

The first 10 predictions in ROI: ['MUS' 'MUS' 'AIR' 'MUS' 'MUS' 'MUS' 'MUS' 'MUS' 'MUS' 'MUS']

Make a Pandes dataframe for the prediction
Later we will transform named tissue types to a numerical encoding using a dictionary

In [75]:

df_y_pred = pd.DataFrame(y_pred,  columns=['Class'])
print(df_y_pred.info())
df_y_pred.head(10).T

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35822 entries, 0 to 35821
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Class   35822 non-null  object
dtypes: object(1)
memory usage: 280.0+ KB
None

Out[75]:

	0	1	2	3	4	5	6	7	8	9
Class	MUS	MUS	AIR	MUS	MUS	MUS	MUS	MUS	MUS	MUS

In [76]:

df_y_pred.tail(10).T

Out[76]:

	35812	35813	35814	35815	35816	35817	35818	35819	35820	35821
Class	AIR	AIR	AIR	AIR	AIR	AIR	AIR	AIR	AIR	AIR

Construct a dictionary that defines a ono-to-one map between tissuse type names and tissue type classes as numbers

In [77]:

cla_code = {
    'BCK': 0, # White (background)  Color-coding according to cla_cmap defined above
    'AIR': 1, # Magenta 
    'GM':  2, # Red 
    'WM':  3, # Cyan 
    'CSF': 4, # Blue 
    'MUS': 5, # Green 
    'FAT': 6  # Yellow 
}

df_y_pred['Class'] = df_y_pred['Class'].map(cla_code) 
# Note: if the dictionary does not exhaustively map all entries
#       then non-matched entries are changed to NaNs

We select the FLASH channel (high CNRs) for the superposition of the KNN tissue type classification¶

Initially, we "lift" (i.e. add a constant to) the range of signal intensities in the FLASH channels to avoid mixing data values in FLASH with the overlay of tissue class numbers

In [78]:

# Adding max classnumber + 1 to the FLASH data to avoid mixing data values with predicted class-numbers
cla_data = np.int16((data[:,:,0].copy().squeeze() + (df_y_pred.values.max() + 1)*np.ones(roimsk_data[:,:,0].shape)))
mx = cla_data.max()
mn = cla_data.min()
print('min - max =', mn, '-', mx)

# Scaling to range [0, 1]
cla_data = cla_data/mx
mx = cla_data.max().round(4)
mn = cla_data.min().round(4)
print('min - max =', mn, '-', mx)
cla_data.shape

min - max = 7 - 364
min - max = 0.0192 - 1.0

Out[78]:

(256, 256)

Fill the cla_data matrix with the pixel-wise KNN predictions (scaled to the interval [0, 1])

In [79]:

mx = df_y_pred['Class'].values.max()  # Max class value

Insert the predicted class numbers in the corresponding pixel locations (row, col) in the head ROI mask, i.e. going from feature space to image space

In [80]:

cla_data[dfTest['row'].values, dfTest['col'].values] = df_y_pred['Class'].values / mx

Inspect the tissue classification using both color-coding and gray-level coding¶

In [81]:

fig, axes = plt.subplots(2,2, figsize=(15,15))
ax = axes.ravel()

ax[0].imshow(c_tmsk_data[:, :], cmap=c_cla_cmap, origin='lower')
ax[0].set_title('Color-coded corrected training mask')
ax[0].axis('on')

ax[1].imshow(data[:, :, 0].squeeze(), cmap='gray', origin='lower')
ax[1].set_title('FLASH')

ax[2].imshow(cla_data[:, :] / mx, cmap='gray', origin='lower')
ax[2].set_title('Tissue-class prediction (gray-scale)')

# Hadamar (element-wise) product np.multiply(a,b)
cla_within_roi = np.multiply(cla_data[:, :], roimask) # / mx
u, c = np.unique(cla_within_roi, return_counts=True)
s = 0.5*(u[-1]-u[0])/(c_ncla+1)

c_cla_names_num = []
for i in range(c_ncla):
    str = c_cla_names[i] + ' [%d]: card=%d' % (i, c[i])
    c_cla_names_num.append(str)

cmsk = ax[3].imshow(cla_within_roi)
clim=cmsk.properties()['clim']
cax = ax[3].imshow(cla_within_roi, cmap=c_cla_cmap, origin='lower', clim=clim)
cbar = fig.colorbar(cax, shrink=0.4, label='Tissue classed [0-%d] and their cardinality' % (c_ncla-1))
tick_locs = np.linspace(u[0]+s, u[-1]-s, c_ncla)
cbar.set_ticks(tick_locs)
cbar.ax.set_yticklabels(c_cla_names_num)
ax[3].set_title('Tissue-class prediction (color-coded)')

plt.suptitle('KNN classification (K=%d) of the multispectral MRI slice within the ROI brain mask' % K, size=16)
plt.tight_layout
plt.show()

fig.savefig('./results/04_KNN_(K=%d)_classification_results_on_flash.png' % K, transparent=False, dpi=300, bbox_inches="tight")

$\rightarrow$ Your turn!¶

Discuss the K-NN classification results (e.g. plausibility? any likely misclassifications? ....)¶
Repeat the KNN-classification experiments using the original signal intensities (no feature scaling)¶
Compare and discuss your findings with those obtained using feature scaling¶
How could you quantitatively assess the difference in predictions between feauture scaling and no feature scaling?¶

EXTRA: Tissue classification using the Random Forest classifier¶

where tha class labels are mapped to numeric values

In [82]:

df_y_train = pd.DataFrame(y_train,  columns=['Class'])
print(df_y_train.info())
df_y_train.head().T

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1986 entries, 0 to 1985
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Class   1986 non-null   object
dtypes: object(1)
memory usage: 15.6+ KB
None

Out[82]:

	0	1	2	3	4
Class	AIR	AIR	AIR	AIR	AIR

In [83]:

df_y_train['Class'] = df_y_train['Class'].map(cla_code)
y_train_num = df_y_train['Class'].values

Random Forest classifier:

In [84]:

classifierRF = RandomForestClassifier(n_estimators=100, n_jobs=-1, random_state=42)

In [85]:

classifierRF.fit(X_train_scaled, y_train_num)

Out[85]:

RandomForestClassifier(n_jobs=-1, random_state=42)

In [86]:

y_rf_pred = classifierRF.predict(X_test_scaled)

Fill a data matrix cla_data_rf with the pixel-wise RF predicitons (scaled to the interval [0, 1])

In [87]:

mx = df_y_pred['Class'].values.max()  # Max class value
cla_data_rf = cla_data.copy()
cla_data_rf[dfTest['row'].values, dfTest['col'].values] = y_rf_pred / mx

Visual comparison between the KNN and the RF predictions¶

In [88]:

fig, axes = plt.subplots(1,2, figsize=(20,10))
ax = axes.ravel()
ax[0].imshow(np.multiply(cla_data[:, :], roimask) / mx, cmap=c_cla_cmap, origin='lower')
ax[0].set_title('Tissue-class KNN prediction (K=%d)' % K, fontsize=18)
ax[1].imshow(np.multiply(cla_data_rf[:, :], roimask) / mx, cmap=c_cla_cmap, origin='lower')
ax[1].set_title('Tissue-class RF prediction', fontsize=18)
plt.show()

$\rightarrow$ Your turn!¶

Which classifier KNN or RF seems to do the best job (giving the most plausible results)?¶
The RF classifier has several hyperparameters. Could you tune these to improve performance (visually assessed)¶

Evaluation of results - Confusion matrix between KNN and RF classification¶

i.e. y_pred_knn ("true") versus y_pred_rf

In [89]:

from utils import plot_confusion_matrix, plot_confusion_matrix_with_colorbar

In [90]:

y_pred_knn = df_y_pred['Class'].values

In [91]:

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_pred_knn, y_rf_pred)
print(cm)

[[ 8157     0     0    17   975     0]
 [   11 10268   131   178   180     8]
 [    0  1534  2489     0    32     3]
 [   79   119     0  2100   104     4]
 [  456   271    58   557  6168   202]
 [   13     9    35    62     1  1601]]

In [92]:

plot_confusion_matrix_with_colorbar(cm, classes=class_names, title='Confusion matrix - Predicted=RF, "True"=KNN (K=%d)' % K, figsize=(10,10))
plt.ylabel('KNN label ("True")')
plt.xlabel('RF label (Predicted)')
plt.show()

Or, using Seaborn's heatmap:

In [93]:

import seaborn as sn

df_cm = pd.DataFrame(cm, index = [i for i in class_names],
                  columns = [i for i in class_names])

plt.figure(figsize = (12,10))
ax=sn.heatmap(df_cm, annot=True, annot_kws={"size": 16}, fmt='d')
sn.set(font_scale=2) # for label size
cbar = ax.collections[0].colorbar
cbar.ax.tick_params(labelsize=14)
plt.show()

$\rightarrow$ Your turn!¶

Could you suggest (or try out) any other classification methods for this task and the given data?¶

HINT: Check https://scikit-learn.org/stable/supervised_learning.html#supervised-learning

In [ ]:

	FLASH	DESS	FISP	PSIF	row	col
0	56	28	22	59	42	143
1	49	20	34	47	42	144
2	32	17	63	35	42	145
3	36	28	86	33	42	146
4	61	44	90	43	42	147

	FLASH	DESS	FISP	PSIF	row	col
0	56	28	22	59	42	143
1	49	20	34	47	42	144
2	32	17	63	35	42	145
3	36	28	86	33	42	146
4	61	44	90	43	42	147

Multiparametric MRI - supervised (KNN) tissue classification¶

Learning objectives and context¶

For using Colab¶

Theory: K-nearest neighbor classication (KNN)¶

Four-channel multispectral image and associated training masks¶

Set up our Python environment¶

MRI pulse sequenses (features) and tissue types (classes)¶

Construct a multispectral image¶

Load a multispectral MR image, training masks and ROI mask¶

Show properties of the multispectral NIFTI image¶

and the corresponding training mask¶

and the ROI (head mask)¶

Display the multispectral image as separate channel images¶

Show the training mask with color-coded class (tissue type) labels¶

Make a figure showing the color-coded labeled pixels (training mask)¶

Correcting training mask (cf. black pixels on the boundary of the CSF mask being mapped to unlabeled background, BCK)¶

We make a data frame (FVB) from multispectral MRI data and training mask¶

Save dataframe FVB from the labeled NIFTI data as a .csv file¶

Supervised classification with the KNN algorithm¶

Read the training data (X and y)¶

Separate the data matrix X_train from the corresponding vector of labels y_train in the FVB dataframe¶

Load the test data set X_test defined by the head ROI mask¶

Find all pixel locations in the ROI to extract the test data matrix, X_test¶

Scaling of feature vectors (MRI signal intensities) in X_train and X_test¶

→\rightarrow Your turn¶

Why should we consider feature scaling (normalization) in machine learning?¶

Estimate (fit) a KNN classifier given the training data set¶

We try K=1¶

Prediction (y_train_pred) on the training dataset, (X_train)¶

Performance evaluation on the traing dataset using the confusion matrix¶

→\rightarrow Your turn!¶

Why are we guaranteed the confusion matrix being diagonal in this case?¶

What assumption do we make to give such a guarantee?¶

Evaluation on the training dataset according to a classification_report¶

No, we try K >> 1¶

Performance evaluation¶

→\rightarrow Your turn!¶

Do experimentes on the training dataset with K-NN for different values of K, e.g. K∈{3,5,27,51}K \in \{3, 5, 27, 51\} and check the corresponding confusion matrix.¶

Why do we frequently get non-diagonal confusion matrices in this case?¶

Prediction on the scaled X_test dataset¶

(delineated by the head ROI mask)¶

Prediction results¶

We select the FLASH channel (high CNRs) for the superposition of the KNN tissue type classification¶

Inspect the tissue classification using both color-coding and gray-level coding¶

→\rightarrow Your turn!¶

Discuss the K-NN classification results (e.g. plausibility? any likely misclassifications? ....)¶

Repeat the KNN-classification experiments using the original signal intensities (no feature scaling)¶

Compare and discuss your findings with those obtained using feature scaling¶

How could you quantitatively assess the difference in predictions between feauture scaling and no feature scaling?¶

EXTRA: Tissue classification using the Random Forest classifier¶

Visual comparison between the KNN and the RF predictions¶

→\rightarrow Your turn!¶

Which classifier KNN or RF seems to do the best job (giving the most plausible results)?¶

The RF classifier has several hyperparameters. Could you tune these to improve performance (visually assessed)¶

Evaluation of results - Confusion matrix between KNN and RF classification¶

→\rightarrow Your turn!¶

Could you suggest (or try out) any other classification methods for this task and the given data?¶

Read the training data (`X` and `y`)¶

Separate the data matrix `X_train` from the corresponding vector of labels `y_train` in the FVB dataframe¶

Load the test data set `X_test` defined by the head `ROI mask`¶

Find all pixel locations in the ROI to extract the test data matrix, `X_test`¶

Scaling of feature vectors (MRI signal intensities) in `X_train` and `X_test`¶

$\rightarrow$ Your turn¶

Performance evaluation on the traing dataset using the `confusion matrix`¶

$\rightarrow$ Your turn!¶

Evaluation on the training dataset according to a `classification_report`¶

$\rightarrow$ Your turn!¶

Do experimentes on the training dataset with K-NN for different values of K, e.g. $K \in \{3, 5, 27, 51\}$ and check the corresponding confusion matrix.¶

Prediction on the scaled `X_test` dataset¶

$\rightarrow$ Your turn!¶

$\rightarrow$ Your turn!¶

$\rightarrow$ Your turn!¶

	FLASH	DESS	FISP	PSIF	row	col
0	56	28	22	59	42	143
1	49	20	34	47	42	144
2	32	17	63	35	42	145
3	36	28	86	33	42	146
4	61	44	90	43	42	147