SOA 2019 Predictive Modeling, Innovation and Industry Contest¶

Computer Vision in Healthcare¶

Executive Summary¶

The proposal for the Society of Actuaries (SOA) 2019 Predictive Modeling, Innovation and Industry Contest, is aimed to illustrate the potential of computer vision algorithms in healthcare. The primary objective is to design computer vision algorithms that improve diagnosis accuracy by reducing false positives and false negatives. The current proposal studies the HAM10000, Human Against Machine 10000, a dermatostopic image database [1]. The objective of the algorithm is to classify images of seven skin conditions, malignant, and non malignant, which include: Actinic Keratoses, Basal cell carcinoma, Benign keratosis, Dermatofibroma, Melanocytic nevi, Melanoma, and Vascular skin lesions. The algorithm presented is based on a Convolutional Neural Network (CNN), with ResNet50 [2] backbone, pre-trained on Imagenet weights and fine tuned with the current data. The winners of the International Skin Imaging Collaboration (ISIC) 2018 challlenge achieved a balanced multiclass accuracy of 88.5% in the competition test set. Unfortunately, the ISIC 2018 leaderboard is not accepting new submissions so we can not replicate the score on the test set. However, we replicated the 5-fold cross-validation score as reported by the winners of the challenge. Our algorithm achieves 67.9% balanced multiclass accuracy using 5-fold cross-validation on ResNet50, using only the 10,015 training samples. This result is similar to the 72.1% to 77.0% 5 fold cross-validation scores of ResNet-50 reported by the competition winners, that used 47,822 training samples. Our proposed algorithm uses only one neural network trained on ResNet50. The winners of the ISIC 2018 challenge used an ensemble of more than five models and external data, which is computationally more expensive [3].

This artificial intelligence tool is aimed primarily for primary care physicians, as they refer to specialist level of care. With a more accurate diagnosis, the use of dermatological care and biopsies may be reduced, reducing overall healthcare costs. The tool could also be used by health insurance companies to validate their diagnosis, aid in the underwritting process and evaluate providers in terms of diagnosis accuracy [4]. This tool is not intended to replace the dermatologist judgment, as an underlying condition may have different risk factors (such as age, gender, progression of the lesion, etc.) that are not captured by these artificial intelligence (AI) tools based on image data only.

Introduction¶

Applications of computer vision in healthcare are numerous. Several examples include: X-ray pneumonia detection [5], breast cancer classification [6], skin cancer detection, [7] among others. An example of a start up that leverages this technology is entilic https://www.enlitic.com/. Entilic specializes in helping radiologists interpret X-rays, CT scans and MRI studies, in order to prioritize their work. Entilic claims that it has discovered malignant tumor nodules up to 18 months faster than standard of care.

An SOA report (Shang, 2018) highlights the importance of computer vision in insurance applications [4]. In this article, the author presents a solution to a distracted driver problem, using CNNs, and also describes the potential of computer vision in other insurance areas, such as underwritting and healthcare. Our proposal focuses on providing more details of the implementation of computer vision techniques in healthcare. In particular, our proposal focuses on skin cancer detection, but the analysis can be extrapolated to other health conditions. As noted by Shang (2018), medical images can provide richer information for health insurance underwriting and pricing.

The jupyter notebook is divided in several sections. First, we will present a brief overview of the technical requirements to run computer vision models. Second, we will provide a descriptive analysis of the HAM10000 dataset. After that we will cover the model set up, and the creation of the training and validation sets using cross-validation. The next step will illustrate the CNN architecture ResNet50, and the selection of the model parameters. A final step will describe the analysis of the results, and the limitations of the analysis.

Problem Definition and Risks¶

Computer vision has a lot of applications in the health insurance marketplace. As noted by Shang, 2018, health insurance is an area where image recognition can provide more insight regarding the risk of individual customers. Physicians can use image recognition for diagnosis and prognosis [4]. In addition to linking doctors’ opinions to insurance pricing, given enough image data and claim experience, medical images such as computed tomography (CT) images of cancer patients may be used directly as pricing factors to help underwriting and derive risk loading and insurance premium rate [4].

Shang, 2018, also notes the risk of the computer vision models. Regulatory and reputational risk may also occur with the application of image recognition [4]. Image data may not be considered private and thus may be judged inappropriate for determining insurance rate and claim [4]. If an insurance company does not fully explain its methods to customers, regulators and the public, using nonpublic image data could cause reputational incidents [4]. Even if the data are usable, the way they are used to assess risks and determine insurance premium may be questioned [4]. Clients may want to know exactly how the algorithm determines the risk loading for insurance products [4]. Insurers that want to use image recognition technology need to be prepared for explanation and communication [4].

Technical Details¶

This notebook was created using Jupyter notebook version 5.6.0, and Python 3.7.0. The operating sytem used is Ubuntu 18.04, on a computer using an NVIDIA 2080 GPU card (8GB memory size). For image processing, the use of a solid state hard drive is recommended to speed the training process due to the number of read and write operations involved in the training of computer vision models.

This notebook uses a Graphic Processing Unit NVIDIA GPU. Configuration instructions for the deep learning environment are available on the NVIDIA website https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html.

The Python libraries used for the analysis include: fastai (version 1.0.56.dev0), a machine learning library that includes code for computer vision models, pandas (version 0.24.2), a library for data manipulation, sklearn (version 0.20.2) for selection of the training, validation, and test sets and for calculation of important metrics, cv2 (version 3.4.3), for image processing, and matplotlib (version 3.0.0), for general purpose graphics.

In [1]:

%reload_ext autoreload
%autoreload 2
%matplotlib inline

import matplotlib.pyplot as plt

#Create symbolic link for fastai path - only once
#! ln -s /home/username/fastai/fastai


#Load libraries
import fastai
import sklearn

from fastai.vision import *
import pandas as pd # For data manipulation

from fastai.callbacks import *


from sklearn.model_selection import train_test_split



import cv2
import matplotlib 

print ("fastai version", fastai.__version__)
print ("Pandas version", pd.__version__)
print ("sklearn version", sklearn.__version__)

print ("cv2", cv2.__version__)
print ("matplotlib", matplotlib.__version__)


#To handle LaTeX references
#https://github.com/Jupyter-contrib/jupyter_nbextensions_configurator
#!pip install jupyter_nbextensions_configurator
#jupyter nbextensions_configurator enable --user
#!pip install jupyter_contrib_nbextensions
#!jupyter contrib nbextension install --user

fastai version 1.0.56.dev0
Pandas version 0.24.2
sklearn version 0.20.2
cv2 3.4.3
matplotlib 3.0.0

Data Analysis¶

The HAM10000 database (Human Against Machine with 10000 training images), contains 10015 dermatoscopic images which are released as a training set for academic machine learning purposes [1]. The database is publicly available at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DBW86T, and it is released under a Creative Commons Attribution-NonCommercial 4.0 International Public License.

The HAM10000 database database contains 10,015 images of skin conditions corresponding to 7,470 unique lesions, whith some lesions having as many as six pictures. We assume that each lesion corresponds to an independent patient. The database includes information for seven diagnosis including: Actinic Keratoses (Solar Keratoses) and Intraepithelial Carcinoma (Bowen’s disease) (akiec) (N=228), Basal cell carcinoma (bcc) (N=327), Benign keratosis (bkl) (N=727), Dermatofibroma (df) (N=73), Melanocytic nevi (nv) (N=5,403), Melanoma (mel) (N=614), and Vascular skin lesions (vasc) (N=98). The average age of the patients in the database is 52 years (standard deviation 16 years) (N = 7,418; 0.70 % missing data) with a minimumm age of 0 months and maximum age of 85 years, and 54% male gender (N = 4,001). The images of the lessions come from fifteen diferent regions, such as lower extremity 21% (N=1,584), back 21% (N= 1,534), trunk 17%(N= 1,269), abdomen 11% (N=822), and others 30% (N=2,261).

As can be seen from the frequency counts of the skin conditions, we have an unbalanced classification problem, with the most prevalent diagnosis being Melanocytic nevi (N = 5,403 patients) and the least prevalent diagnosis being Dermatofibroma (N=73 patients).

Data Quality¶

Data was reviewed for consistency according to the Actuarial Standards of Practice (ASOP) 23. The metadata is believed to be adequate for the purposes of this analysis, with a small percentage of missing data in the age field (0.70 % missing data N = 52 patients), and we didn't identify any potential outliers. For the image data, Tschandl, et. al [1] state that they perform manual changes to image histograms to correct for over/under exposure and undesiredcolour shifts. Aditionally, Tschandl, et al [1] also state that more than 50% of lesions have been confirmed by pathology, while the ground truth for the rest of the cases was either follow-up, expert consensus, or confirmation by in-vivo confocal microscopy. Therefore, the data is deemed apropriate for the purposes of this analysis. However, material changes in the data could potentially yield to significant changes in the inference.

Load Metadata¶

In [2]:

#Load metadata
metadata = pd.read_csv('/home/user/Documents/HAM_Dataset/HAM10000_metadata.csv')

In [3]:

#Path to the folder containing the images
 
path_img = '/home/user/Documents/HAM_Dataset/Data/'

In [4]:

metadata.head()

Out[4]:

	lesion_id	image_id	dx	dx_type	age	sex	localization
0	HAM_0000118	ISIC_0027419	bkl	histo	80.0	male	scalp
1	HAM_0000118	ISIC_0025030	bkl	histo	80.0	male	scalp
2	HAM_0002730	ISIC_0026769	bkl	histo	80.0	male	scalp
3	HAM_0002730	ISIC_0025661	bkl	histo	80.0	male	scalp
4	HAM_0001466	ISIC_0031633	bkl	histo	75.0	male	ear

In [5]:

patient_data = metadata.drop_duplicates(['lesion_id'])

Descriptive analysis of the Data¶

In [6]:

metadata.describe(include='all')

Out[6]:

	lesion_id	image_id	dx	dx_type	age	sex	localization
count	10015	10015	10015	10015	9958.000000	10015	10015
unique	7470	10015	7	4	NaN	3	15
top	HAM_0001863	ISIC_0031764	nv	histo	NaN	male	back
freq	6	1	6705	5340	NaN	5406	2192
mean	NaN	NaN	NaN	NaN	51.863828	NaN	NaN
std	NaN	NaN	NaN	NaN	16.968614	NaN	NaN
min	NaN	NaN	NaN	NaN	0.000000	NaN	NaN
25%	NaN	NaN	NaN	NaN	40.000000	NaN	NaN
50%	NaN	NaN	NaN	NaN	50.000000	NaN	NaN
75%	NaN	NaN	NaN	NaN	65.000000	NaN	NaN
max	NaN	NaN	NaN	NaN	85.000000	NaN	NaN

In [7]:

patient_data.describe(include='all')

Out[7]:

	lesion_id	image_id	dx	dx_type	age	sex	localization
count	7470	7470	7470	7470	7418.000000	7470	7470
unique	7470	7470	7	4	NaN	3	15
top	HAM_0002673	ISIC_0031764	nv	follow_up	NaN	male	lower extremity
freq	1	1	5403	3704	NaN	4001	1584
mean	NaN	NaN	NaN	NaN	51.658803	NaN	NaN
std	NaN	NaN	NaN	NaN	16.016472	NaN	NaN
min	NaN	NaN	NaN	NaN	0.000000	NaN	NaN
25%	NaN	NaN	NaN	NaN	40.000000	NaN	NaN
50%	NaN	NaN	NaN	NaN	50.000000	NaN	NaN
75%	NaN	NaN	NaN	NaN	65.000000	NaN	NaN
max	NaN	NaN	NaN	NaN	85.000000	NaN	NaN

In [8]:

patient_data['dx'].value_counts()

Out[8]:

nv       5403
bkl       727
mel       614
bcc       327
akiec     228
vasc       98
df         73
Name: dx, dtype: int64

In [9]:

patient_data['sex'].value_counts()

Out[9]:

male       4001
female     3419
unknown      50
Name: sex, dtype: int64

In [10]:

patient_data['localization'].value_counts() 

Out[10]:

lower extremity    1584
back               1534
trunk              1269
abdomen             822
upper extremity     776
face                470
chest               240
foot                228
unknown             203
neck                119
scalp                78
hand                 64
genital              45
ear                  35
acral                 3
Name: localization, dtype: int64

Selected Examples of the Skin Conditions¶

Below we present an example image of each of the seven skin conditions studied in this database. The goal of the computer vision algorithm is to predict given an image, which one of the seven skin conditions the image corresponds to.

In [90]:

#Selected Examples of the Lessions
#Set path 
path_img = '/home/user/Documents/HAM_Dataset/Data/'

import warnings
warnings.filterwarnings("ignore")

#function taken from https://www.datacamp.com/community/tutorials/face-detection-python-opencv
def convertToRGB(image):
    return cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

fig = plt.figure(figsize=(10,10))

#akiec ISIC_0029417

ax1 = fig.add_subplot(331)
ax1.set_title('Actinic Keratoses')
ax1 = cv2.imread(path_img + 'ISIC_0029417.jpg')
ax1 = plt.imshow(convertToRGB(ax1))

#bcc ISIC_0028155

ax2 = fig.add_subplot(332)
ax2.set_title('Basal cell carcinoma')
ax2 = cv2.imread(path_img + 'ISIC_0028155.jpg')
ax2 = plt.imshow(convertToRGB(ax2))

#bkl ISIC_0027419
ax3 = fig.add_subplot(333)
ax3.set_title('Benign keratosis')
ax3 = cv2.imread(path_img + 'ISIC_0027419.jpg')
ax3 = plt.imshow(convertToRGB(ax3))

#df ISIC_0027008
ax4 = fig.add_subplot(334)
ax4.set_title('Dermatofibroma')
ax4 = cv2.imread(path_img + 'ISIC_0027008.jpg')
ax4 = plt.imshow(convertToRGB(ax4))

#nv ISIC_0024698
ax6 = fig.add_subplot(335)
ax6.set_title('Melanocytic nevi')
ax6 = cv2.imread(path_img + 'ISIC_0024698.jpg')
ax6 = plt.imshow(convertToRGB(ax6))

#mel ISIC_0025964
ax5 = fig.add_subplot(336)
ax5.set_title('Melanoma')
ax5 = cv2.imread(path_img + 'ISIC_0025964.jpg')
ax5 = plt.imshow(convertToRGB(ax5))

#vasc ISIC_0031197
ax7 = fig.add_subplot(337)
ax7.set_title('Vascular skin lesions')
ax7 = cv2.imread(path_img + 'ISIC_0031197.jpg')
ax7 = plt.imshow(convertToRGB(ax7))

fig.show()

Creation of the training and validation sets¶

The next step in the modeling process is the creation of the training and validation sets. We do that using 5-fold stratified cross validation to get folds that have similar proportions of every diagnosis. Since there are several lesions with repeated pictures, we make sure that no lesions overlap between the train and validation sets. Doing so is called a validation set leakage, and will most likely cause overfitting, as the model will learn to memorize the images of a particular condition and not generalize well in unseen data.

In [12]:

#Creation of a database by lession id

from sklearn.model_selection import StratifiedKFold
skf = StratifiedKFold(n_splits=5, random_state=42)

lesion_id = pd.DataFrame({'lesion_id':list(metadata['lesion_id'].value_counts().index)})

patient_data = metadata.drop_duplicates(['lesion_id'])
patient_data = patient_data.reset_index(drop=True)
X = patient_data['lesion_id']
y = patient_data['dx']

In [13]:

train_1, train_2, train_3, train_4, train_5 = [patient_data.loc[i[0]] for i in skf.split(X, y)]

In [14]:

valid_1, valid_2, valid_3, valid_4, valid_5 = [patient_data.loc[i[1]] for i in skf.split(X, y)]

In [15]:

train_1 = train_1.assign(database = "train")
train_1 = train_1.assign(is_validation = False)
valid_1 = valid_1.assign(database = "validation")
valid_1 = valid_1.assign(is_validation = True)

In [16]:

train_2 = train_2.assign(database = "train")
train_2 = train_2.assign(is_validation = False)
valid_2 = valid_2.assign(database = "validation")
valid_2 = valid_2.assign(is_validation = True)

train_3 = train_3.assign(database = "train")
train_3 = train_3.assign(is_validation = False)
valid_3 = valid_3.assign(database = "validation")
valid_3 = valid_3.assign(is_validation = True)

train_4 = train_4.assign(database = "train")
train_4 = train_4.assign(is_validation = False)
valid_4 = valid_4.assign(database = "validation")
valid_4 = valid_4.assign(is_validation = True)

train_5 = train_5.assign(database = "train")
train_5 = train_5.assign(is_validation = False)
valid_5 = valid_5.assign(database = "validation")
valid_5 = valid_5.assign(is_validation = True)

In [17]:

#Concatenate the databases
metadata_train_valid = pd.concat([train_1, valid_1]) 
metadata_train_valid2 = pd.concat([train_2, valid_2])
metadata_train_valid3 = pd.concat([train_3, valid_3])
metadata_train_valid4 = pd.concat([train_4, valid_4])
metadata_train_valid5 = pd.concat([train_5, valid_5])

In [18]:

metadata_train_valid = pd.merge(metadata, metadata_train_valid[['lesion_id', 'is_validation']], on=['lesion_id'])
metadata_train_valid2 = pd.merge(metadata, metadata_train_valid2[['lesion_id', 'is_validation']], on=['lesion_id'])
metadata_train_valid3 = pd.merge(metadata, metadata_train_valid3[['lesion_id', 'is_validation']], on=['lesion_id'])
metadata_train_valid4 = pd.merge(metadata, metadata_train_valid4[['lesion_id', 'is_validation']], on=['lesion_id'])
metadata_train_valid5 = pd.merge(metadata, metadata_train_valid5[['lesion_id', 'is_validation']], on=['lesion_id'])

In [19]:

pd.crosstab(metadata_train_valid.dx, metadata_train_valid.is_validation) #metadata_train_valid['dx'].value_counts()

Out[19]:

is_validation	False	True
dx
akiec	268	59
bcc	400	114
bkl	810	289
df	88	27
mel	890	223
nv	5623	1082
vasc	105	37

In [20]:

pd.crosstab(metadata_train_valid2.dx, metadata_train_valid2.is_validation) #metadata_train_valid2['dx'].value_counts()

Out[20]:

is_validation	False	True
dx
akiec	262	65
bcc	396	118
bkl	914	185
df	85	30
mel	847	266
nv	5624	1081
vasc	99	43

In [21]:

pd.crosstab(metadata_train_valid3.dx, metadata_train_valid3.is_validation) #metadata_train_valid3['dx'].value_counts()

Out[21]:

is_validation	False	True
dx
akiec	256	71
bcc	435	79
bkl	886	213
df	85	30
mel	857	256
nv	5624	1081
vasc	118	24

In [22]:

pd.crosstab(metadata_train_valid4.dx, metadata_train_valid4.is_validation) # metadata_train_valid4['dx'].value_counts()

Out[22]:

is_validation	False	True
dx
akiec	251	76
bcc	412	102
bkl	847	252
df	101	14
mel	942	171
nv	4882	1823
vasc	123	19

In [23]:

pd.crosstab(metadata_train_valid5.dx, metadata_train_valid5.is_validation) #metadata_train_valid5['dx'].value_counts()

Out[23]:

is_validation	False	True
dx
akiec	271	56
bcc	413	101
bkl	939	160
df	101	14
mel	916	197
nv	5067	1638
vasc	123	19

Modeling¶

The training takes place in the fastai library [8], which sits on top of a popular AI library Pytorch. Installation instructions for the fastai library can be found at https://docs.fast.ai/install.html.

The architecture selected for the CNN is ResNet50 [2], pre-trained on imagenet weights. ResNet is a popular architecture for training deep neural networks that uses a residual learning framework [2]. We initialize the network weights using Imagenet. Training takes place in two stages, first with image size to 224 * 224, we train only the heads (this is the last part of the network), then we unfreeze the other layers and train the rest of the network. Original image size is 600 * 450.

To train the neural network we need to normalize the data, to do so, we calculate the statistics of the images mean and standard deviation, and use this to normalize the image set.

Model parameters¶

Drop out regularization, a method to remove connections to reduce overfitting, was set to 50%.

Batch size: This is the number of images that get processed by the learning process at a time. This depends on the GPU memory. In this case, using an NVIDIA 2080 card we can accomodate 32 images of size 224 * 224.

Loss Function: We optimize the loss function cross entropy loss with weights. To deal with class imbalance we use the inverse of the frequency of each one of the seven skin conditions. This is, the loss function gets penalized more heavily if the prediction of a rare class is wrong. If we don't use a penalized loss function, the network tends to predict the most frequent class.

Learning rate: After we create the weights, we also need to find the learning rate. This is one of the most important parameters when training neural networks. We use this by using cyclical learning rates as described in https://arxiv.org/abs/1506.01186.

Data augmentation is also a technique to artificially create new data and reduce overfitting. This is availabe in fastai, with the function tfms = get_transforms(flip_vert = True). The transformations applied to the data are: random flip, flip vertical, rotation, zoom, a random lightning and contrast change, and a random symmetric warp.

Test time augmentation was used at the end of every fold to improve the accuracy of the predictions. This is accomplished by taking every image, applying transformations, and taking the average of all of the predictions.

In [24]:

#Make the training process deterministic 
#function taken from kernel https://www.kaggle.com/iafoss/hypercolumns-pneumothorax-fastai-0-831-lb

SEED = 2019

def seed_everything(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    #tf.set_random_seed(seed)
seed_everything(SEED)

In [25]:

#Normalization statistics 
import cv2


img_stats = []

def open_image(fname):
    img = cv2.imread(os.path.join(path_img, fname + '.jpg'))/255
    #img = cv2.resize(img, (sz, sz)) / 255
    x_tot = img.mean() #image statistics
    x2_tot = (img**2).mean()
    
    return x_tot, x2_tot


for fname in metadata['image_id']:
    img_stats.append(open_image(fname))

In [26]:

#function taken from https://www.kaggle.com/iafoss/data-repack-and-image-statistics
def get_stats(stats): # get dataset statistics 
    x_tot, x2_tot = 0.0, 0.0
    for x, x2 in stats:
        x_tot += x
        x2_tot += x2
    
    img_avr =  x_tot/len(stats)
    img_std =  np.sqrt(x2_tot/len(stats) - img_avr**2)
    print('mean:',img_avr, ', std:', img_std)
    
    
get_stats(img_stats)

mean: 0.6267264561492303 , std: 0.18320752802569892

Fold 1¶

In [28]:

epochs_head = 4
epochs_all = 15
patience = 3

In [29]:

from sklearn.metrics import confusion_matrix

In [30]:

#Class weights to deal with class imbalance
#New weights - inverse of class frequency

sample_size = 10015
num_classes = 7


w1 = sample_size/(327)
w2 = sample_size/(514)
w3 = sample_size/(1099)
w4 = sample_size/(115)
w5 = sample_size/(1113)
w6 = sample_size/(6705) 
w7 = sample_size/(142)


weights = [w1, w2, w3, w4, w5, w6, w7]
class_weights = torch.FloatTensor(weights).cuda()


print(weights)

[30.62691131498471, 19.48443579766537, 9.112829845313922, 87.08695652173913, 8.998203054806828, 1.493661446681581, 70.52816901408451]

In [31]:

#Increase image size


gc.collect()

bs = 32 #batch size - depends on the GPU memory and number of GPUs available. For The current GPU we have 2080 with
#memory 8GB, we can use bs =16


def get_data(metadata_train_valid, bs, size):
    stats = ([0.626, 0.626, 0.626], [0.1829, 0.1829, 0.1829])
    
    data = (ImageList.from_df(metadata_train_valid, path_img, cols='image_id', suffix='.jpg')
        .split_from_df(col='is_validation')
        .label_from_df(cols='dx')
        .transform(size=(size, size), tfms = get_transforms(flip_vert = True))
        .databunch(bs=bs)).normalize(stats)
    
    return data


data = get_data(metadata_train_valid, bs, 224)

#Define Convolutional Neural Network with backbone resnet50
learn = cnn_learner(data, models.resnet50, metrics = [error_rate, accuracy],
                   callback_fns=[ShowGraph])

learn.loss_func=nn.CrossEntropyLoss(weight=class_weights)

In [32]:

#Learn rate finder
learn.lr_find()
learn.recorder.plot()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

In [33]:

#Fit heads
seed_everything(SEED)

learn.clip_grad(1.0);

lr = 1e-3
learn.fit_one_cycle(epochs_head, max_lr = lr,   
                    callbacks=[SaveModelCallback(learn, monitor='accuracy',
                                                 mode='max', name='Best_model_sz224_resnet50_fold1')])

Total time: 02:57

epoch	train_loss	valid_loss	error_rate	accuracy	time
0	1.495504	1.219684	0.426543	0.573457	00:43
1	0.939320	0.888996	0.320044	0.679956	00:45
2	0.816192	0.945954	0.344074	0.655926	00:43
3	0.704094	0.787698	0.287821	0.712179	00:44

Better model found at epoch 0 with accuracy value: 0.5734571218490601.
Better model found at epoch 1 with accuracy value: 0.679956316947937.
Better model found at epoch 3 with accuracy value: 0.7121791243553162.

In [34]:

learn.load('Best_model_sz224_resnet50_fold1')
learn.unfreeze()

In [35]:

learn.lr_find()
learn.recorder.plot()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

In [36]:

learn.fit_one_cycle(epochs_all, max_lr = slice(1e-6, 1e-5), wd=0.05, 
                    callbacks=[SaveModelCallback(learn, monitor='accuracy',
                              mode='max', name='Best_model_sz224_stage2_resnet50_fold1'),
                              ReduceLROnPlateauCallback(learn, monitor='accuracy', 
                              mode='max', patience = patience)])

Total time: 14:48

epoch	train_loss	valid_loss	error_rate	accuracy	time
0	0.630204	0.839754	0.301475	0.698525	00:59
1	0.641056	0.834359	0.301475	0.698525	00:59
2	0.691175	0.804068	0.290005	0.709995	00:59
3	0.599363	0.784751	0.283452	0.716548	00:58
4	0.623151	0.802924	0.286729	0.713271	00:58
5	0.603048	0.836452	0.299290	0.700710	00:58
6	0.617417	0.847394	0.296013	0.703987	00:58
7	0.517802	0.794418	0.285636	0.714364	00:58
8	0.539848	0.791049	0.280721	0.719279	00:58
9	0.522666	0.799893	0.285090	0.714910	00:58
10	0.498964	0.768912	0.274713	0.725287	00:58
11	0.529219	0.753789	0.263790	0.736210	00:58
12	0.491320	0.781869	0.275806	0.724194	00:58
13	0.507704	0.774308	0.275259	0.724741	00:58
14	0.521256	0.808908	0.288913	0.711087	00:58

Better model found at epoch 0 with accuracy value: 0.6985253691673279.
Better model found at epoch 2 with accuracy value: 0.7099945545196533.
Better model found at epoch 3 with accuracy value: 0.7165483236312866.
Epoch 7: reducing lr to 1.4996633150096014e-06
Better model found at epoch 8 with accuracy value: 0.7192791104316711.
Better model found at epoch 10 with accuracy value: 0.7252867221832275.
Better model found at epoch 11 with accuracy value: 0.7362097501754761.

In [37]:

learn.load('Best_model_sz224_stage2_resnet50_fold1')

interp1 = ClassificationInterpretation.from_learner(learn)
print(interp1.confusion_matrix().sum())

losses,idxs = interp1.top_losses()
len(data.valid_ds)==len(losses)==len(idxs)

Out[37]:

True

In [38]:

interp1.plot_confusion_matrix()

In [39]:

confusion_matrix1 = interp1.confusion_matrix()

FP = confusion_matrix1.sum(axis=0) - np.diag(confusion_matrix1)  
FN = confusion_matrix1.sum(axis=1) - np.diag(confusion_matrix1)
TP = np.diag(confusion_matrix1)
TN = confusion_matrix1.sum() - (FP + FN + TP)

# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)

TPR.mean()

Out[39]:

0.6537219451416985

In [40]:

interp1_tta = ClassificationInterpretation.from_learner(learn, tta=True)
confusion_matrix1_tta = interp1_tta.confusion_matrix()

FP = confusion_matrix1_tta.sum(axis=0) - np.diag(confusion_matrix1_tta)  
FN = confusion_matrix1_tta.sum(axis=1) - np.diag(confusion_matrix1_tta)
TP = np.diag(confusion_matrix1_tta)
TN = confusion_matrix1_tta.sum() - (FP + FN + TP)

# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)

TPR.mean()

Total time: 00:44

Out[40]:

0.6699866694639852

Fold 2¶

In [41]:

gc.collect()
learn.destroy()
seed_everything(SEED)
bs = 32
data = get_data(metadata_train_valid2, bs, 224)

#Define Convolutional Neural Network with backbone resnet50
learn = cnn_learner(data, models.resnet50, metrics = [error_rate, accuracy],
                   callback_fns=[ShowGraph])

learn.loss_func=nn.CrossEntropyLoss(weight=class_weights)

this Learner object self-destroyed - it still exists, but no longer usable

In [42]:

#Learn rate finder
learn.lr_find()
learn.recorder.plot()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

In [43]:

#Fit heads
learn.clip_grad(1.0);
lr =  1e-3
learn.fit_one_cycle(epochs_head, max_lr = lr, 
                    callbacks=[SaveModelCallback(learn, monitor='accuracy',
                                                 mode='max', name='Best_model_sz224_resnet50_fold2')])

Total time: 02:53

epoch	train_loss	valid_loss	error_rate	accuracy	time
0	1.467729	1.015799	0.338926	0.661074	00:43
1	1.024468	0.597618	0.196868	0.803132	00:42
2	0.833893	0.575928	0.171141	0.828859	00:43
3	0.671142	0.558927	0.178412	0.821588	00:42

Better model found at epoch 0 with accuracy value: 0.6610738039016724.
Better model found at epoch 1 with accuracy value: 0.8031319975852966.
Better model found at epoch 2 with accuracy value: 0.8288590312004089.

In [44]:

learn.load('Best_model_sz224_resnet50_fold2')
learn.unfreeze()

In [45]:

learn.lr_find()
learn.recorder.plot()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

In [46]:

learn.fit_one_cycle(epochs_all, max_lr = slice(1e-5, 1e-4), wd=0.05, 
                    callbacks=[SaveModelCallback(learn, monitor='accuracy',
                                                 mode='max', name='Best_model_sz224_stage2_resnet50_fold2'),
        ReduceLROnPlateauCallback(learn, monitor='accuracy', mode='max', patience = patience)])

Total time: 14:33

epoch	train_loss	valid_loss	error_rate	accuracy	time
0	0.681718	0.555275	0.166107	0.833893	00:57
1	0.628009	0.546218	0.167226	0.832774	00:58
2	0.630218	0.582755	0.176734	0.823266	00:57
3	0.584994	0.588226	0.185123	0.814877	00:58
4	0.542188	0.557275	0.187360	0.812640	00:57
5	0.450436	0.529461	0.167226	0.832774	00:57
6	0.404708	0.560523	0.166107	0.833893	00:57
7	0.356681	0.590561	0.184004	0.815996	00:58
8	0.297416	0.600775	0.163870	0.836130	00:57
9	0.285370	0.581090	0.157159	0.842841	00:57
10	0.244933	0.591917	0.154922	0.845078	00:57
11	0.231776	0.566462	0.152125	0.847875	00:57
12	0.199275	0.561260	0.146532	0.853468	00:57
13	0.192746	0.579187	0.151007	0.848993	00:57
14	0.171276	0.574982	0.153803	0.846197	00:57

Better model found at epoch 0 with accuracy value: 0.8338926434516907.
Epoch 4: reducing lr to 1.9887481084816066e-05
Better model found at epoch 8 with accuracy value: 0.8361297249794006.
Better model found at epoch 9 with accuracy value: 0.8428411483764648.
Better model found at epoch 10 with accuracy value: 0.8450782895088196.
Better model found at epoch 11 with accuracy value: 0.8478747010231018.
Better model found at epoch 12 with accuracy value: 0.853467583656311.

In [47]:

learn.load('Best_model_sz224_stage2_resnet50_fold2')

interp2 = ClassificationInterpretation.from_learner(learn)
print(interp2.confusion_matrix().sum())

losses,idxs = interp2.top_losses()
len(data.valid_ds)==len(losses)==len(idxs)

Out[47]:

True

In [48]:

confusion_matrix2 = interp2.confusion_matrix()


FP = confusion_matrix2.sum(axis=0) - np.diag(confusion_matrix2)  
FN = confusion_matrix2.sum(axis=1) - np.diag(confusion_matrix2)
TP = np.diag(confusion_matrix2)
TN = confusion_matrix2.sum() - (FP + FN + TP)

# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)


TPR.mean()

Out[48]:

0.6378821047093878

In [49]:

interp2_tta = ClassificationInterpretation.from_learner(learn, tta=True)

confusion_matrix2_tta = interp2_tta.confusion_matrix()


FP = confusion_matrix2_tta.sum(axis=0) - np.diag(confusion_matrix2_tta)  
FN = confusion_matrix2_tta.sum(axis=1) - np.diag(confusion_matrix2_tta)
TP = np.diag(confusion_matrix2_tta)
TN = confusion_matrix2_tta.sum() - (FP + FN + TP)

# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)


TPR.mean()

Total time: 00:42

Out[49]:

0.6447937710613377

Fold 3¶

In [50]:

gc.collect()
learn.destroy()
seed_everything(SEED)
bs = 32
data = get_data(metadata_train_valid3, bs, 224)

#Define Convolutional Neural Network with backbone resnet50
learn = cnn_learner(data, models.resnet50, metrics = [error_rate, accuracy],
                   callback_fns=[ShowGraph])

learn.loss_func=nn.CrossEntropyLoss(weight=class_weights)

this Learner object self-destroyed - it still exists, but no longer usable

In [51]:

#Learn rate finder
learn.lr_find()
learn.recorder.plot()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

In [52]:

#Fit heads
learn.clip_grad(1.0);
lr= 1e-3
learn.fit_one_cycle(epochs_head,  max_lr = lr, 
                    callbacks=[SaveModelCallback(learn, monitor='accuracy',
                                                 mode='max', name='Best_model_sz224_resnet50_fold3')])

Total time: 02:54

epoch	train_loss	valid_loss	error_rate	accuracy	time
0	1.590884	0.870534	0.293615	0.706385	00:43
1	1.025691	0.460051	0.157925	0.842075	00:43
2	0.819632	0.441824	0.149373	0.850627	00:43
3	0.694179	0.444519	0.153934	0.846066	00:42

Better model found at epoch 0 with accuracy value: 0.7063854336738586.
Better model found at epoch 1 with accuracy value: 0.8420752286911011.
Better model found at epoch 2 with accuracy value: 0.8506271243095398.

In [53]:

learn.load('Best_model_sz224_resnet50_fold3')

learn.unfreeze()

In [54]:

learn.lr_find()
learn.recorder.plot()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

In [55]:

learn.fit_one_cycle(epochs_all, max_lr = slice(1e-5, 1e-4), 
                    callbacks=[SaveModelCallback(learn, monitor='accuracy',
                                                 mode='max', name='Best_model_sz224_stage2_resnet50_fold3'),
                              ReduceLROnPlateauCallback(learn, monitor='accuracy', mode='max', patience = patience)])

Total time: 14:41

epoch	train_loss	valid_loss	error_rate	accuracy	time
0	0.717923	0.437675	0.150513	0.849487	00:58
1	0.708005	0.433533	0.148803	0.851197	00:57
2	0.610181	0.408890	0.129418	0.870582	00:58
3	0.587448	0.432712	0.146522	0.853478	00:57
4	0.597028	0.429640	0.139681	0.860319	00:58
5	0.495178	0.450995	0.137400	0.862600	00:57
6	0.447768	0.458231	0.136260	0.863740	00:57
7	0.339780	0.465034	0.135120	0.864880	00:59
8	0.306543	0.458970	0.140251	0.859749	00:59
9	0.254653	0.519076	0.144242	0.855758	00:59
10	0.278268	0.475927	0.131129	0.868871	00:58
11	0.231285	0.444155	0.127138	0.872862	00:58
12	0.209446	0.465456	0.130559	0.869441	00:58
13	0.213950	0.461254	0.127708	0.872292	00:58
14	0.183432	0.463150	0.127138	0.872862	00:59

Better model found at epoch 0 with accuracy value: 0.8494868874549866.
Better model found at epoch 1 with accuracy value: 0.8511972427368164.
Better model found at epoch 2 with accuracy value: 0.8705815076828003.
Epoch 6: reducing lr to 1.7330529396223397e-05
Epoch 10: reducing lr to 6.346644369977027e-06
Better model found at epoch 11 with accuracy value: 0.8728620409965515.

In [56]:

learn.load('Best_model_sz224_stage2_resnet50_fold3')

interp3 = ClassificationInterpretation.from_learner(learn)
print(interp3.confusion_matrix().sum())

losses,idxs = interp3.top_losses()
len(data.valid_ds)==len(losses)==len(idxs)

Out[56]:

True

In [57]:

confusion_matrix3 = interp3.confusion_matrix()


FP = confusion_matrix3.sum(axis=0) - np.diag(confusion_matrix3)  
FN = confusion_matrix3.sum(axis=1) - np.diag(confusion_matrix3)
TP = np.diag(confusion_matrix3)
TN = confusion_matrix3.sum() - (FP + FN + TP)

# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)


TPR.mean()

Out[57]:

0.7765075886881279

In [58]:

interp3_tta = ClassificationInterpretation.from_learner(learn, tta=True)

confusion_matrix3_tta = interp3_tta.confusion_matrix()

FP = confusion_matrix3_tta.sum(axis=0) - np.diag(confusion_matrix3_tta)  
FN = confusion_matrix3_tta.sum(axis=1) - np.diag(confusion_matrix3_tta)
TP = np.diag(confusion_matrix3_tta)
TN = confusion_matrix3_tta.sum() - (FP + FN + TP)

# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)

TPR.mean()

Total time: 00:41

Out[58]:

0.7898711317061807

Fold 4¶

In [59]:

gc.collect()
learn.destroy()
seed_everything(SEED)
bs = 32
data = get_data(metadata_train_valid4, bs, 224)

#Define Convolutional Neural Network with backbone resnet50
learn = cnn_learner(data, models.resnet50, metrics = [error_rate, accuracy],
                   callback_fns=[ShowGraph])

learn.loss_func=nn.CrossEntropyLoss(weight=class_weights)

this Learner object self-destroyed - it still exists, but no longer usable

In [60]:

#Learn rate finder
learn.lr_find()
learn.recorder.plot()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

In [61]:

#Fit heads
learn.clip_grad(1.0);
lr = 1e-3
learn.fit_one_cycle(epochs_head,  max_lr = lr, 
                    callbacks=[SaveModelCallback(learn, monitor='accuracy',
                                                 mode='max', name='Best_model_sz224_resnet50_fold4')])

Total time: 02:53

epoch	train_loss	valid_loss	error_rate	accuracy	time
0	1.492628	2.018486	0.601140	0.398860	00:42
1	1.048403	1.989057	0.617013	0.382987	00:43
2	0.812597	1.555402	0.549451	0.450549	00:44
3	0.671437	1.738290	0.575092	0.424908	00:42

Better model found at epoch 0 with accuracy value: 0.3988603949546814.
Better model found at epoch 2 with accuracy value: 0.450549453496933.

In [62]:

learn.load('Best_model_sz224_resnet50_fold4')

learn.unfreeze()

In [63]:

#Learn rate finder
learn.lr_find()
learn.recorder.plot()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

In [64]:

learn.fit_one_cycle(epochs_all, max_lr = slice(1e-5, 1e-4), wd=0.05, 
                    callbacks=[SaveModelCallback(learn, monitor='accuracy',
                                                 mode='max', name='Best_model_sz224_stage2_resnet50_fold4'),
                              ReduceLROnPlateauCallback(learn, monitor='accuracy', mode='max', patience = patience)])

Total time: 14:11

epoch	train_loss	valid_loss	error_rate	accuracy	time
0	0.710640	1.562170	0.549451	0.450549	00:55
1	0.686482	1.628219	0.543753	0.456247	00:56
2	0.660732	1.805500	0.584860	0.415140	00:56
3	0.593029	1.581686	0.511193	0.488807	00:56
4	0.530795	1.492337	0.526659	0.473341	00:56
5	0.431930	1.691596	0.541311	0.458689	00:57
6	0.413762	1.521912	0.483109	0.516891	00:55
7	0.400882	1.838967	0.516077	0.483923	00:56
8	0.317534	1.374171	0.442409	0.557591	00:56
9	0.270573	1.484009	0.455026	0.544974	00:57
10	0.223583	1.250241	0.400081	0.599919	00:56
11	0.215717	1.371741	0.428978	0.571022	00:56
12	0.189399	1.302813	0.397639	0.602361	00:56
13	0.192204	1.514861	0.438339	0.561661	00:55
14	0.184456	1.332400	0.407814	0.592186	00:56

Better model found at epoch 0 with accuracy value: 0.450549453496933.
Better model found at epoch 1 with accuracy value: 0.4562474489212036.
Better model found at epoch 3 with accuracy value: 0.4888074994087219.
Better model found at epoch 6 with accuracy value: 0.5168905258178711.
Better model found at epoch 8 with accuracy value: 0.5575905442237854.
Better model found at epoch 10 with accuracy value: 0.5999186038970947.
Better model found at epoch 12 with accuracy value: 0.6023606061935425.

In [65]:

learn.load('Best_model_sz224_stage2_resnet50_fold4')

interp4 = ClassificationInterpretation.from_learner(learn)
print(interp4.confusion_matrix().sum())

losses,idxs = interp4.top_losses()
len(data.valid_ds)==len(losses)==len(idxs)

Out[65]:

True

In [66]:

confusion_matrix4 = interp4.confusion_matrix()


FP = confusion_matrix4.sum(axis=0) - np.diag(confusion_matrix4)  
FN = confusion_matrix4.sum(axis=1) - np.diag(confusion_matrix4)
TP = np.diag(confusion_matrix4)
TN = confusion_matrix4.sum() - (FP + FN + TP)

# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)

TPR.mean()

Out[66]:

0.7053206236926929

In [67]:

interp4_tta = ClassificationInterpretation.from_learner(learn, tta=True)
confusion_matrix4_tta = interp4_tta.confusion_matrix()


FP = confusion_matrix4_tta.sum(axis=0) - np.diag(confusion_matrix4_tta)  
FN = confusion_matrix4_tta.sum(axis=1) - np.diag(confusion_matrix4_tta)
TP = np.diag(confusion_matrix4_tta)
TN = confusion_matrix4_tta.sum() - (FP + FN + TP)

# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)

TPR.mean()

Total time: 00:58

Out[67]:

0.7151314019463104

Fold 5¶

In [68]:

gc.collect()
learn.destroy()
seed_everything(SEED)
bs = 32
data = get_data(metadata_train_valid5, bs, 224)

#Define Convolutional Neural Network with backbone resnet50
learn = cnn_learner(data, models.resnet50, metrics = [error_rate, accuracy],
                   callback_fns=[ShowGraph])

learn.loss_func=nn.CrossEntropyLoss(weight=class_weights)

this Learner object self-destroyed - it still exists, but no longer usable

In [69]:

#Learn rate finder
learn.lr_find()
learn.recorder.plot()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

In [70]:

#Fit heads
learn.clip_grad(1.0);
lr = 1e-3
learn.fit_one_cycle(epochs_head, wd=0.05, max_lr = lr, 
                    callbacks=[SaveModelCallback(learn, monitor='accuracy',
                                                 mode='max', name='Best_model_sz224_resnet50_fold5')])

Total time: 02:58

epoch	train_loss	valid_loss	error_rate	accuracy	time
0	1.493403	3.474834	0.797254	0.202746	00:44
1	1.032310	2.029426	0.714874	0.285126	00:43
2	0.834895	2.003390	0.669565	0.330435	00:45
3	0.613229	1.923828	0.672311	0.327689	00:43

Better model found at epoch 0 with accuracy value: 0.20274598896503448.
Better model found at epoch 1 with accuracy value: 0.28512585163116455.
Better model found at epoch 2 with accuracy value: 0.33043476939201355.

In [71]:

learn.load('Best_model_sz224_resnet50_fold5')

learn.unfreeze()

In [72]:

#Learn rate finder
learn.lr_find()
learn.recorder.plot()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

In [73]:

learn.fit_one_cycle(epochs_all, max_lr = slice(1e-5, 1e-4), wd=0.05, 
                    callbacks=[SaveModelCallback(learn, monitor='accuracy',
                                                 mode='max', name='Best_model_sz224_stage2_resnet50_fold5'),
                    ReduceLROnPlateauCallback(learn, monitor='accuracy', mode='max', patience = patience)])

Total time: 14:42

epoch	train_loss	valid_loss	error_rate	accuracy	time
0	0.599751	1.898071	0.660412	0.339588	00:56
1	0.612804	1.914753	0.674600	0.325400	00:57
2	0.624066	2.138207	0.686499	0.313501	00:57
3	0.551573	2.306092	0.729062	0.270938	00:58
4	0.522837	2.133091	0.640275	0.359725	00:58
5	0.446836	1.879109	0.613730	0.386270	00:57
6	0.376646	2.046531	0.620137	0.379863	00:57
7	0.354499	2.168431	0.653089	0.346911	00:57
8	0.312210	1.944573	0.591304	0.408696	00:58
9	0.247157	1.918137	0.554691	0.445309	00:57
10	0.206232	1.909455	0.560641	0.439359	00:59
11	0.202486	2.022898	0.547368	0.452632	01:01
12	0.182351	1.976655	0.544622	0.455378	01:02
13	0.159520	1.815552	0.513501	0.486499	00:58
14	0.164680	1.998900	0.551030	0.448970	00:58

Better model found at epoch 0 with accuracy value: 0.33958810567855835.
Better model found at epoch 4 with accuracy value: 0.35972538590431213.
Better model found at epoch 5 with accuracy value: 0.38627001643180847.
Better model found at epoch 8 with accuracy value: 0.40869563817977905.
Better model found at epoch 9 with accuracy value: 0.4453089237213135.
Better model found at epoch 11 with accuracy value: 0.4526315927505493.
Better model found at epoch 12 with accuracy value: 0.45537757873535156.
Better model found at epoch 13 with accuracy value: 0.4864988625049591.

In [74]:

learn.load('Best_model_sz224_stage2_resnet50_fold5')

interp5 = ClassificationInterpretation.from_learner(learn)
print(interp5.confusion_matrix().sum())

losses,idxs = interp5.top_losses()
len(data.valid_ds)==len(losses)==len(idxs)

Out[74]:

True

In [75]:

confusion_matrix5 = interp5.confusion_matrix()


FP = confusion_matrix5.sum(axis=0) - np.diag(confusion_matrix5)  
FN = confusion_matrix5.sum(axis=1) - np.diag(confusion_matrix5)
TP = np.diag(confusion_matrix5)
TN = confusion_matrix5.sum() - (FP + FN + TP)

# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)


TPR.mean()

Out[75]:

0.6755170098556844

In [76]:

interp5_tta = ClassificationInterpretation.from_learner(learn, tta=True)

confusion_matrix5_tta = interp5_tta.confusion_matrix()


FP = confusion_matrix5_tta.sum(axis=0) - np.diag(confusion_matrix5_tta)  
FN = confusion_matrix5_tta.sum(axis=1) - np.diag(confusion_matrix5_tta)
TP = np.diag(confusion_matrix5_tta)
TN = confusion_matrix5_tta.sum() - (FP + FN + TP)

# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)


TPR.mean()

Total time: 00:51

Out[76]:

0.6691429111511357

Analysis of the Cross Validated Results¶

We obtain a 66.8% balanced multiclass accuracy without augmentation and a 67.9% balanced multiclass accuracy using test time augmentation. Details of the other classification metrics are presented below.

In [77]:

confusion_matrix = confusion_matrix1 + confusion_matrix2 + confusion_matrix3 + confusion_matrix4 + confusion_matrix5

In [78]:

confusion_matrix_TTA = confusion_matrix1_tta + confusion_matrix2_tta + confusion_matrix3_tta + confusion_matrix4_tta + confusion_matrix5_tta

In [79]:

confusion_matrix

Out[79]:

array([[ 206,   29,   53,    9,   22,    8,    0],
       [  37,  372,   37,   17,   22,   23,    6],
       [  60,   24,  666,    2,  196,  147,    4],
       [  12,    9,    9,   65,   11,    9,    0],
       [  25,   14,  127,   16,  657,  266,    8],
       [  35,  106,  445,   79, 1136, 4863,   41],
       [   3,    4,    1,    0,    8,    7,  119]])

In [80]:

confusion_matrix_TTA

Out[80]:

array([[ 229,   25,   40,    8,   17,    8,    0],
       [  41,  388,   33,   12,   11,   20,    9],
       [  77,   21,  669,    2,  167,  162,    1],
       [  11,   10,    7,   65,   11,   11,    0],
       [  38,    9,  131,   14,  616,  295,   10],
       [  45,   98,  432,   37,  968, 5082,   43],
       [   2,    6,    1,    0,   10,    7,  116]])

In [81]:

confusion_matrix.sum()

Out[81]:

In [82]:

confusion_matrix_TTA.sum()

Out[82]:

In [83]:

#https://stackoverflow.com/questions/31324218/scikit-learn-how-to-obtain-true-positive-true-negative-false-positive-and-fal

FP = confusion_matrix.sum(axis=0) - np.diag(confusion_matrix)  
FN = confusion_matrix.sum(axis=1) - np.diag(confusion_matrix)
TP = np.diag(confusion_matrix)
TN = confusion_matrix.sum() - (FP + FN + TP)

# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP) 
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)

# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)


print("True Positive Rate         ", TPR)
print("True Negative Rate         ", TNR)
print("Positive Predictive Value  ", PPV)
print("Negative Predictive Value  ", NPV)
print("False Positive Rate        ", FPR)
print("False Negative Rate        ", FNR)
print("False Discovery Rate       ", FDR)
print("Accuracy                   ", ACC)

True Positive Rate          [0.629969 0.723735 0.606005 0.565217 0.590296 0.72528  0.838028]
True Negative Rate          [0.982246 0.980423 0.92463  0.987576 0.843294 0.861027 0.994024]
Positive Predictive Value   [0.544974 0.666667 0.497758 0.345745 0.320175 0.913583 0.668539]
Negative Predictive Value   [0.987444 0.984985 0.950098 0.994912 0.942735 0.607417 0.997662]
False Positive Rate         [0.017754 0.019577 0.07537  0.012424 0.156706 0.138973 0.005976]
False Negative Rate         [0.370031 0.276265 0.393995 0.434783 0.409704 0.27472  0.161972]
False Discovery Rate        [0.455026 0.333333 0.502242 0.654255 0.679825 0.086417 0.331461]
Accuracy                    [0.970744 0.967249 0.889666 0.982726 0.815177 0.770145 0.991812]

In [84]:

TPR.mean() #without augmentation

Out[84]:

0.6683617121946673

In [85]:

TNR.mean()

Out[85]:

0.9390313951117332

In [86]:

FP = confusion_matrix_TTA.sum(axis=0) - np.diag(confusion_matrix_TTA)  
FN = confusion_matrix_TTA.sum(axis=1) - np.diag(confusion_matrix_TTA)
TP = np.diag(confusion_matrix_TTA)
TN = confusion_matrix_TTA.sum() - (FP + FN + TP)

# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP) 
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)

# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)


print("True Positive Rate         ", TPR)
print("True Negative Rate         ", TNR)
print("Positive Predictive Value  ", PPV)
print("Negative Predictive Value  ", NPV)
print("False Positive Rate        ", FPR)
print("False Negative Rate        ", FNR)
print("False Discovery Rate       ", FDR)
print("Accuracy                   ", ACC)

True Positive Rate          [0.700306 0.754864 0.608735 0.565217 0.553459 0.757942 0.816901]
True Negative Rate          [0.977911 0.982212 0.92777  0.992626 0.866996 0.848036 0.993619]
Positive Predictive Value   [0.51693  0.696589 0.50952  0.471014 0.342222 0.909937 0.648045]
Negative Predictive Value   [0.989762 0.986678 0.950586 0.994938 0.939501 0.633634 0.997357]
False Positive Rate         [0.022089 0.017788 0.07223  0.007374 0.133004 0.151964 0.006381]
False Negative Rate         [0.299694 0.245136 0.391265 0.434783 0.446541 0.242058 0.183099]
False Discovery Rate        [0.48307  0.303411 0.49048  0.528986 0.657778 0.090063 0.351955]
Accuracy                    [0.968847 0.970544 0.892761 0.987718 0.832152 0.787718 0.991113]

In [87]:

TPR.mean() # with test time augmentation

Out[87]:

0.67963208445167

In [88]:

TNR.mean()

Out[88]:

0.9413101678032485

Limitations¶

The test data from the ISIC 2018 is not available. Althought we are confident that our cross-validation strategy is reliable, we are not able to submit the results to the competition and get a leaderboard standing.

We only have one GPU NVIDIA 2080 with memory size of 8GB. The use of multiple GPUs can speed the training process and increase accuracy by processing a larger batch size.

We have made all efforts to make the training process as deterministic as possible using the seed_everything function. However, training of CNN still carries a stochastic component, and the results may be replicated but not identically.

There is great variability in the results for each one of the folds, with validation scores ranging from 62% to 79%. This may suggest that the weighted cross-entropy metric that we optimized may not be very stable, and other optimization metrics that focus on the detection of rare events, such as focal loss may be more beneficial for the training of the neural network.

Conclussions¶

Computer vision algorithms have enormous potential in healthcare applications. This jupyter notebook illustrates the process of training a CNN model to model a dermatological dataset of seven skin conditions. Several other applications of this methodology have the potential to improve diagnosis accuracy and reduce healthcare costs.

References¶

[1] Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 (2018). doi: 10.1038/sdata.2018.161

[2] He, Kaiming et al. Deep Residual Learning for Image Recognition. Microsoft Research. https://arxiv.org/abs/1512.03385v1

[3] ISIC 2018 challenge Task 3: Lesion Diagnosis https://challenge2018.isic-archive.com/leaderboards/

[4] Shang, Kailan. Applying Image Recognition to Insurance. Society of Actuaries. June 2018. https://www.soa.org/globalassets/assets/Files/resources/research-report/2018/applying-image-recognition.pdf

[5] Pranav Rajpurkar, et al. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. https://arxiv.org/abs/1711.05225

[6] Levy, Daniel, Jain, Arzav. Breast Mass Classification from Mammograms using Deep Convolutional Neural Networks. 2016. https://arxiv.org/abs/1612.00542

[7] Esteva, Andre, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature volume 542, pages 115–118 (02 February 2017).

[8] Howard, Jeremy and others, fastai, 2018, publisher= GitHub, https://github.com/fastai/fastai