%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.vision import *
from fastai.callbacks.hooks import *
from fastai.utils.mem import *
path = untar_data(URLs.CAMVID)
path.ls()
[PosixPath('/root/.fastai/data/camvid/codes.txt'), PosixPath('/root/.fastai/data/camvid/labels'), PosixPath('/root/.fastai/data/camvid/valid.txt'), PosixPath('/root/.fastai/data/camvid/images')]
path_lbl = path/'labels'
path_img = path/'images'
file_names = get_image_files(path_img)
file_names[:3]
[PosixPath('/root/.fastai/data/camvid/images/0016E5_00600.png'), PosixPath('/root/.fastai/data/camvid/images/0001TP_008070.png'), PosixPath('/root/.fastai/data/camvid/images/0006R0_f01770.png')]
lbl_names = get_image_files(path_lbl)
lbl_names[:3]
[PosixPath('/root/.fastai/data/camvid/labels/0006R0_f02460_P.png'), PosixPath('/root/.fastai/data/camvid/labels/0016E5_07965_P.png'), PosixPath('/root/.fastai/data/camvid/labels/0006R0_f03810_P.png')]
img = open_image(file_names[0])
img.show(figsize=(5,5))
#A function to change image filename to mask filename
get_file_label = lambda x: path_lbl/f'{x.stem}_P{x.suffix}'
mask = open_mask(get_file_label(file_names[0]))
mask.show(figsize=(5,5), alpha=1)
src_size = np.array(mask.shape[1:])
src_size, mask.data
(array([720, 960]), tensor([[[21, 21, 21, ..., 4, 4, 4], [21, 21, 21, ..., 4, 4, 4], [21, 21, 21, ..., 4, 4, 4], ..., [17, 17, 17, ..., 17, 17, 17], [17, 17, 17, ..., 17, 17, 17], [17, 17, 17, ..., 17, 17, 17]]]))
#All the possible categories
codes = np.loadtxt(path/'codes.txt', dtype=str)
codes
array(['Animal', 'Archway', 'Bicyclist', 'Bridge', 'Building', 'Car', 'CartLuggagePram', 'Child', 'Column_Pole', 'Fence', 'LaneMkgsDriv', 'LaneMkgsNonDriv', 'Misc_Text', 'MotorcycleScooter', 'OtherMoving', 'ParkingBlock', 'Pedestrian', 'Road', 'RoadShoulder', 'Sidewalk', 'SignSymbol', 'Sky', 'SUVPickupTruck', 'TrafficCone', 'TrafficLight', 'Train', 'Tree', 'Truck_Bus', 'Tunnel', 'VegetationMisc', 'Void', 'Wall'], dtype='<U17')
For this we will use the size, half of original image and a batchsize of 8. We will also apply the same transforms to target mask.
size = src_size//2
bs=8
src = (SegmentationItemList.from_folder(path_img)
.split_by_fname_file('../valid.txt')
.label_from_func(get_file_label, classes=codes))
data = (src.transform(get_transforms(), size=size, tfm_y=True)
.databunch(bs=bs)
.normalize(imagenet_stats))
Let's see how our databunch looks
data.show_batch(2, figsize=(10,10))
name2id = {v:k for k,v in enumerate(codes)}
name2id
{'Animal': 0, 'Archway': 1, 'Bicyclist': 2, 'Bridge': 3, 'Building': 4, 'Car': 5, 'CartLuggagePram': 6, 'Child': 7, 'Column_Pole': 8, 'Fence': 9, 'LaneMkgsDriv': 10, 'LaneMkgsNonDriv': 11, 'Misc_Text': 12, 'MotorcycleScooter': 13, 'OtherMoving': 14, 'ParkingBlock': 15, 'Pedestrian': 16, 'Road': 17, 'RoadShoulder': 18, 'SUVPickupTruck': 22, 'Sidewalk': 19, 'SignSymbol': 20, 'Sky': 21, 'TrafficCone': 23, 'TrafficLight': 24, 'Train': 25, 'Tree': 26, 'Truck_Bus': 27, 'Tunnel': 28, 'VegetationMisc': 29, 'Void': 30, 'Wall': 31}
void_code = name2id['Void']
We will train our data in a unet. We will use a ResNet34 for training out model.
And we will use a custom metric 'acc_camvid' to judge the accuracy of model while training since defaults would not be good option in this case. 'acc_camvid' functions checks each pixel of the output mask against each pixel of ground truth, marks true(1) if mask is correct, else false(0) and takes the average across all pixel.
def acc_camvid(input, target):
target = target.squeeze(1)
mask = target != void_code
return (input.argmax(dim=1)[mask]==target[mask]).float().mean()
metrics=acc_camvid
learn = unet_learner(data, models.resnet34, metrics=metrics)
Let's check out the learn rate
lr_find(learn)
learn.recorder.plot()
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
The plot's a bit weird but we can see that the curve starts falling around e-5.
learn.fit_one_cycle(10, slice(3e-5, 3e-4))
epoch | train_loss | valid_loss | acc_camvid | time |
---|---|---|---|---|
0 | 1.713877 | 1.033844 | 0.734518 | 03:25 |
1 | 1.018476 | 0.680098 | 0.826204 | 03:24 |
2 | 0.768844 | 0.569019 | 0.839821 | 03:24 |
3 | 0.617559 | 0.478099 | 0.863465 | 03:24 |
4 | 0.551545 | 0.444398 | 0.869286 | 03:24 |
5 | 0.485277 | 0.384654 | 0.885676 | 03:24 |
6 | 0.435844 | 0.352837 | 0.894725 | 03:24 |
7 | 0.394409 | 0.331936 | 0.904030 | 03:24 |
8 | 0.367578 | 0.316333 | 0.907714 | 03:24 |
9 | 0.349075 | 0.314102 | 0.907747 | 03:24 |
90.7%, that's pretty accurate but let's save the model, unfreeze it and try to train all the layers.
learn.save('stage-1')
learn.unfreeze()
lr_find(learn)
learn.recorder.plot()
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
The plot looks a bit weird but if we see the y axis, we can see that the range is quite small. It seems that the plot rises after e-4.
learn.fit_one_cycle(10, slice(1e-5, 1e-4))
epoch | train_loss | valid_loss | acc_camvid | time |
---|---|---|---|---|
0 | 0.344179 | 0.313869 | 0.907924 | 03:31 |
1 | 0.352107 | 0.317889 | 0.907994 | 03:30 |
2 | 0.362538 | 0.333120 | 0.902628 | 03:31 |
3 | 0.354002 | 0.335093 | 0.906862 | 03:31 |
4 | 0.327157 | 0.300026 | 0.910661 | 03:31 |
5 | 0.312329 | 0.301529 | 0.911706 | 03:31 |
6 | 0.295170 | 0.280674 | 0.918273 | 03:30 |
7 | 0.280219 | 0.272744 | 0.919812 | 03:31 |
8 | 0.272149 | 0.276068 | 0.918739 | 03:31 |
9 | 0.264728 | 0.273691 | 0.919095 | 03:31 |
The accuracy improved but we need to get to atleast 93%. Let's try to train it some more.
learn.save('stage2_1e5-1e4')
lr_find(learn)
learn.recorder.plot()
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
learn.fit_one_cycle(10, slice(1e-5, 1e-4))
epoch | train_loss | valid_loss | acc_camvid | time |
---|---|---|---|---|
0 | 0.262895 | 0.276049 | 0.918928 | 03:31 |
1 | 0.272435 | 0.293768 | 0.912217 | 03:32 |
2 | 0.283823 | 0.276396 | 0.919116 | 03:33 |
3 | 0.278473 | 0.270213 | 0.921988 | 03:34 |
4 | 0.272586 | 0.301436 | 0.911976 | 03:34 |
5 | 0.259340 | 0.318504 | 0.911728 | 03:34 |
6 | 0.248526 | 0.280525 | 0.919261 | 03:34 |
7 | 0.241528 | 0.279612 | 0.917928 | 03:34 |
8 | 0.231219 | 0.272694 | 0.921347 | 03:34 |
9 | 0.224182 | 0.276631 | 0.920725 | 03:34 |
learn.save('stage3_1e5-1e4')
It's getting better overall, but pretty slowly. The training loss and valid loss ratio is also fine. Maybe we should try a higher training rate.
lr_find(learn)
learn.recorder.plot()
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
Our plot says something else. We should again pick a similar rate, as higher rate will give poor results.
learn.fit_one_cycle(12, slice(1e-5, 1e-4))
epoch | train_loss | valid_loss | acc_camvid | time |
---|---|---|---|---|
0 | 0.224240 | 0.278531 | 0.919208 | 03:33 |
1 | 0.227785 | 0.282874 | 0.917643 | 03:34 |
2 | 0.241994 | 0.330280 | 0.899988 | 03:34 |
3 | 0.243677 | 0.329277 | 0.908817 | 03:33 |
4 | 0.247586 | 0.271032 | 0.918950 | 03:34 |
5 | 0.237501 | 0.305518 | 0.910586 | 03:33 |
6 | 0.227155 | 0.277850 | 0.918582 | 03:32 |
7 | 0.214532 | 0.277826 | 0.919533 | 03:32 |
8 | 0.207704 | 0.281854 | 0.920700 | 03:34 |
9 | 0.201587 | 0.264264 | 0.923797 | 03:35 |
10 | 0.199155 | 0.269579 | 0.923109 | 03:37 |
11 | 0.195739 | 0.270489 | 0.922809 | 03:37 |
It seemed to slow down even more. It feels like we should increase the rate. Everything else seems fine. Let's try to train a bit more
lr_find(learn)
learn.recorder.plot()
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
learn.fit_one_cycle(12, slice(1e-6, 5e-4))
epoch | train_loss | valid_loss | acc_camvid | time |
---|---|---|---|---|
0 | 0.201823 | 0.301157 | 0.916544 | 03:34 |
1 | 0.255735 | 0.300689 | 0.914773 | 03:35 |
2 | 0.320701 | 0.313358 | 0.904393 | 03:35 |
3 | 0.319809 | 0.383289 | 0.893561 | 03:35 |
4 | 0.305791 | 0.287613 | 0.919432 | 03:34 |
5 | 0.280641 | 0.279118 | 0.922000 | 03:34 |
6 | 0.256338 | 0.300819 | 0.916679 | 03:34 |
7 | 0.234987 | 0.274713 | 0.924262 | 03:34 |
8 | 0.214243 | 0.251499 | 0.928236 | 03:34 |
9 | 0.201073 | 0.278869 | 0.923350 | 03:35 |
10 | 0.188191 | 0.270839 | 0.926915 | 03:35 |
11 | 0.182626 | 0.266649 | 0.926646 | 03:35 |
The accuracy is now rising very slowly. Maybe we should let it train for more cycles.
We will try to train it one more time. We will train more cycles this time. If we don't reach 93%, we will start again and try something else.
lr_find(learn)
learn.recorder.plot()
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
learn.fit_one_cycle(18, slice(1e-6, 5e-4))
epoch | train_loss | valid_loss | acc_camvid | time |
---|---|---|---|---|
0 | 0.183913 | 0.258469 | 0.929280 | 03:34 |
1 | 0.187855 | 0.274599 | 0.924761 | 03:34 |
2 | 0.200375 | 0.270287 | 0.926045 | 03:34 |
3 | 0.226437 | 0.277119 | 0.921837 | 03:34 |
4 | 0.239927 | 0.277793 | 0.922941 | 03:34 |
5 | 0.249407 | 0.283909 | 0.924975 | 03:34 |
6 | 0.237342 | 0.273870 | 0.923607 | 03:34 |
7 | 0.218582 | 0.261884 | 0.926841 | 03:34 |
8 | 0.216637 | 0.264195 | 0.925641 | 03:34 |
9 | 0.202231 | 0.287281 | 0.923400 | 03:34 |
10 | 0.190338 | 0.265273 | 0.924873 | 03:34 |
11 | 0.183782 | 0.284800 | 0.926929 | 03:34 |
12 | 0.175271 | 0.276577 | 0.925489 | 03:33 |
13 | 0.166248 | 0.274923 | 0.929091 | 03:34 |
14 | 0.160326 | 0.269889 | 0.929934 | 03:34 |
15 | 0.157293 | 0.283274 | 0.929339 | 03:34 |
16 | 0.153415 | 0.286666 | 0.929660 | 03:34 |
17 | 0.151591 | 0.282741 | 0.930236 | 03:34 |
So, we got 93% which is pretty good. Let's save the model and check it out against holdout set
learn.save('93_percent')
learn.path = Path()
learn.path.ls()
[PosixPath('.config'), PosixPath('export.pkl'), PosixPath('drive'), PosixPath('sample_data')]
learn.export()
path = Path()
pred_model = load_learner(path)
Since, the images in holdout test are of different dimensions, we will resize these images according to our mask size otherwise we will not be able to overlay mask and the image.
def get_image(path):
img = open_image(path)
channel, h, w = img.shape
img = img.resize(torch.Size([channel, 360, 480]))
return img
classes=pred_model.data.classes
c2i = {v:k for k,v in enumerate(classes)}
Here's the first image!
You can see that it looks a bit streched due to the resising.
img1 = get_image(path/'img1.jpeg')
img1
preds, pred_idx, outputs = pred_model.predict(img1)
img1.show(y=preds, figsize=(12,12))
Here's our the prediction by our model.
You can see that the we can see the masks on the things. It's bit difficult to make sense of what is predicted by the mask like this. We can see that some of the boundries look pretty good, others not so much.
I found this code in of the PR in FastAI library. It's something which is still under development to improve interpretation of the segmentation problems.
This function displays the mask along with colors of the classes used in a grid.
(One of the several problems with this function is that, it can only process 20 colors at once, after that it repeats the colors.)
def interp_show(ims, classes, sz, c2i, cmap='tab20'):
'show ImageSegment with given the classes'
fig,axes=plt.subplots(1,2,figsize=(sz,sz))
#image
mask = (torch.cat([ims.data==i for i in [c2i[c] for c in classes]])
.max(dim=0)[0][None,:]).long()
masked_im = image2np(ims.data*mask)
im=axes[0].imshow(masked_im, cmap=cmap)
#labels
labels = list(np.unique(masked_im))
c = len(labels); n = math.ceil(np.sqrt(c))
label_im = labels + [np.nan]*(n**2-c)
label_im = np.array(label_im).reshape(n,n)
axes[1].imshow(label_im, cmap=cmap)
i2c = {i:c for c,i in c2i.items()}
for i,l in enumerate([i2c[l] for l in labels]):
div,mod=divmod(i,n)
axes[1].text(mod, div, f"{l}", ha='center', color='white', fontdict={'size':10})
axes[1].set_yticks([]);axes[1].set_xticks([]);
interp_show(preds, classes, 20, c2i)
In the above image we can see the mask and the colors used for classes. In this particular case the colors of mask alone and mask with image are similar but it won't be the case always as we will see in other cases.
So, now if we compare both the images, it's bit better to interpret the results.
Image 1 Result: We can see that the predictions in this case were pretty accurate. It predicted the buildings, the trees, the vegetation, Bicyclist, Car(Auto), Road, Sign Symbol, Pole, fence. I don't think I have any complains with it.
img2 = get_image(path/'img2.jpeg')
img2
preds2, pred_idx2, outputs2 = pred_model.predict(img2)
img2.show(y=preds2, figsize=(12,12))
interp_show(preds2, classes, 20, c2i)
Image 2 Result: This was an image of heavy traffic. Again, the predictions are pretty good in this case as well. It was able to detect road, cars, vegetation, buildings, sky, column pole. Although it confused among a few things as well this time, it was not able to detect all the cars(especially the ones which were not white), it confused some part of the construction site along the road as trees. But these all were minor and most of the things, it got right.
img3 = get_image(path/'img3.jpeg')
img3
preds3, pred_idx3, outputs3 = pred_model.predict(img3)
img3.show(y=preds3, figsize=(12,12))
interp_show(preds3, classes, 20, c2i)
Image 3 Result: This was an image of mostly empty street with several vehicles parked alongside the road. This time it struggled a bit. It confused the road with sidewalk. Although, it's understandable because the road was old and trodden, it had a texture similar to sidewalk more and road less and it was narrow. Also, there was a Truck_Bus spot in the sky and it's really difficult to make sense out of that. But the thing which it failed to predict were the stationary bikes. There were many of them parked along the road. It made a bounding box correctly but it marked it as OtherMoving.
How good was the model at predicting this "holdout" test set? Where is it having difficulty or particular success?
Overall I think the model did pretty well in at the "holdout" test. It was excellent in predicting Trees, Vegetation, Buildings, Street signs and symbols, predestrains. But it struggled against when there was high density of vehicles. In picture 2 it missed a few cars and in picture 3 it didn't categorize the bikes correctly. Althought, in picture 3 it created a excellent bounding region around bikes and categorized it as OtherMoving, so I guess we can give it partial credit for that.
How to improve on the quality with the resources we have?