#!/usr/bin/env python # coding: utf-8 # # The Devil lives in the details # > Resizing method matters... # # - toc: true # - badges: true # - comments: true # - categories: [Pytorch, fastai] # - author: Thomas Capelle # - image: images/pil2tensor.png # ![(TL;DR](../images/pil2tensor.png) # Yesterday I was refactoring some code to put on our production code base. It is a simple image classifier trained with fastai. In our deployment env we are not including fastai as requirements and rely only on pure pytorch to process the data and make the inference. (I am waiting to finally be able to install only the fastai vision part, without the NLP dependencies, this is coming soon, probably in fastai 2.3, at least it is in [Jeremy's roadmap](https://github.com/fastai/fastai/projects/1#card-52606857)). So, I have to make the reading and preprocessing of images as close as possible as fastai `Transform` pipeline, to get accurate model outputs. # # After converting the transforms to `torchvision.transforms` I noticed that my model performance dropped significantly. Initially I thought that it was fastai's fault, but all the problem came from the new interaction between the `tochvision.io.images.read_image` and the `torchvision.transforms.Resize`. This transform can accept `PIL.Image.Image` or Tensors, in short, the resizing does not produce the same image, one is way softer than the other. The solution was not to use the new Tensor API and just use `PIL` as the image reader. # # > TL;DR : torchvision's `Resize` behaves differently if the input is a `PIL.Image` or a torch tensor from `read_image`. Be consistent at training / deploy. # Let's take a quick look on the preprocessing used for training and there corresponding torch version with the new tensor API as shown [here](https://github.com/pytorch/vision/blob/master/examples/python/tensor_transforms.ipynb) # Below are the versions of fastai, fastcore, torch, and torchvision currently running at the time of writing this: # # - `python` : 3.8.6 # - `fastai` : 2.2.8 # - `fastcore` : 1.3.19 # - `torch` : 1.7.1 # - `torch-cuda` : 11.0 # - `torchvision` : 2.2.8: 0.8.2 # # > Note: You can easily grab this info from `fastai.test_utils.show_install` # ## A simple example # > Let's make a simple classifier on the PETS dataset, for more details this comes from the [fastai tutorial](https://docs.fast.ai/tutorial.vision.html) # In[41]: #hide # ! pip install -U git+git://github.com/fastai/fastai@master from fastai.vision.all import * set_seed(2021) defaults.use_cuda=False # let's grab the data # In[43]: path = untar_data(URLs.PETS) files = get_image_files(path/"images") def label_func(f): return f[0].isupper() dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize((256, 192))) # A learner it is just a wrapper of Dataloaders and the model. We will grab an imagene pretrained `resnet18`, we don't really need to train it to illustrate the problem. # In[44]: learn = cnn_learner(dls, resnet18) # and grab one image (`load_image` comes from fastai and returns a memory loaded `PIL.Image.Image`) # In[45]: fname = files[1] img = load_image(fname) img # In[46]: learn.predict(fname) # Let's understand what is happening under the hood: # and we can call the prediction using fastai `predict` method, this will apply the same transforms as to the validation set. # - create PIL image # - Transform the image to pytorch Tensor # - Scale values by 255 # - Normalize with imagenet stats # doing this by hand is extracting the preprocessing transforms: # In[47]: dls.valid.tfms # In[48]: dls.valid.after_item # In[49]: dls.valid.after_batch # Let's put all transforms together on a fastcore `Pipeline` # In[50]: preprocess = Pipeline([Transform(PILImage.create), Resize((256,192)), ToTensor, IntToFloatTensor, Normalize.from_stats(*imagenet_stats, cuda=False)]) # we can then preprocess the image: # In[51]: tfm_img = preprocess(fname) tfm_img.shape # and we get the exact same predictions as before # In[52]: with torch.no_grad(): preds = learn.model(tfm_img).softmax(1) preds # ## Using torchvision preprocessing # > Now let's try to replace fastai transforms with torchvision # In[53]: import PIL import torchvision.transforms as T # In[54]: pil_image = load_image(fname) pil_image # In[55]: type(pil_image) # let's first resize the image, we can do this directly over the `PIL.Image.Image` or using `T.Resize` that works both on `IPIL` images or `Tensor`s # In[56]: resize = T.Resize([256, 192]) res_pil_image = resize(pil_image) # we can then use `T.ToTensor` this will actually scale by 255 and transform to tensor, it is equivalent to both `ToTensor` + `IntToFloatTensor` from fastai. # In[57]: timg = T.ToTensor()(res_pil_image) # then we have to normalize it: # In[58]: norm = T.Normalize(*imagenet_stats) nimg = norm(timg).unsqueeze(0) # and we get almost and identical results! ouff..... # In[60]: with torch.no_grad(): preds = learn.model(nimg).softmax(1) preds # ## Torchvision new Tensor API # > Let's try this new Tensor based API that torchvision introduced on `v0.8` then! # In[61]: import torchvision.transforms as T from torchvision.io.image import read_image # `read_image` is pretty neat, it actually read directly the image to a pytorch tensor, so no need for external image libraries. Using this API has many advantages, as one can group the model and part of the preprocessing as whole, and then export to torchscript all together: model + preprocessing, as shown in the example [here](https://github.com/pytorch/vision/blob/master/examples/python/tensor_transforms.ipynb) # In[62]: timg = read_image(str(fname)) # it is sad that it does not support pathlib objects in 2021... # In[63]: resize = T.Resize([256, 192]) res_timg = resize(timg) # we have to scale it, we have a new transform to do this: # In[64]: scale = T.ConvertImageDtype(torch.float) scaled_timg = scale(res_timg) # In[65]: norm = T.Normalize(*imagenet_stats) nimg = norm(scaled_timg).unsqueeze(0) # Ok, the results is pretty different... # In[66]: with torch.no_grad(): preds = learn.model(nimg).softmax(1) preds # if you trained your model with the old API, reading images using PIL you may find yourself lost as why the models is performing poorly. My classifier was predicting completely the opossite for some images, and that's why I realized that something was wrong! # # Let's dive what is happening... # ## Comparing Resizing methods # > T.Resize on PIL image vs Tensor Image # We will use fastai's `show_images` to make the loading and showing of tensor images easy # In[67]: resize = T.Resize([256, 192], interpolation=PIL.Image.BILINEAR) # In[68]: pil_img = load_image(fname) res_pil_img = image2tensor(resize(pil_img)) tensor_img = read_image(str(fname)) res_tensor_img = resize(tensor_img) difference = (res_tensor_img - res_pil_img).abs() # In[69]: show_images([res_pil_img, res_tensor_img, difference], figsize=(10,5), titles=['PIL', 'Tensor', 'Dif']) # Let's zoom and plot # In[70]: show_images([res_pil_img[:,20:80, 30:100], res_tensor_img[:,20:80, 30:100], difference[:,20:80, 30:100]], figsize=(12,8), titles=['PIL', 'Tensor', 'Dif']) # The `PIL` image is smoother, it is not necesarily better, but it is different. From my testing, for darker images the `PIL` reisze has less moire effect (less noise) # ## Extra: What if I want to use OpenCV? # > A popular choice for pipelines that rely on numpy array transforms, as [Albumnetation](https://github.com/albumentations-team/albumentations/blob/master/docs/index.rst) # In[71]: import cv2 # opencv opens directly an array # In[72]: img_cv = cv2.imread(str(fname)) res_img_cv = cv2.resize(img_cv, (256,192), interpolation=cv2.INTER_LINEAR) # BGR to RGB, and channel first. # In[73]: res_img_cv = res_img_cv.transpose((2,0,1))[::-1,:,:].copy() # In[74]: timg_cv = cast(res_img_cv, TensorImage) timg_cv.shape # In[75]: timg_cv[:,20:80, 30:100].show(figsize=(8,8)) # pretty bad also... # In[76]: learn.predict(timg_cv) # ### with `INTER_AREA` flag # > This method is closer to PIL image resize, as it has a kernel that smooths the image. # In[77]: img_cv_area = cv2.imread(str(fname)) img_cv_area = cv2.resize(img_cv_area, (256,192), interpolation=cv2.INTER_AREA) # In[78]: img_cv_area = img_cv_area.transpose((2,0,1))[::-1,:,:].copy() # In[79]: timg_cv_area = cast(img_cv_area, TensorImage) # In[80]: timg_cv_area[:,20:80, 30:100].show(figsize=(8,8)) # kinda of better... # In[81]: learn.predict(timg_cv_area) # ## Speed comparison # > Let's do some basic performance comparison # In[82]: torch_tensor_tfms = nn.Sequential(T.Resize([256, 192]), T.ConvertImageDtype(torch.float)) def torch_pipe(fname): return torch_tensor_tfms(read_image(str(fname))) # In[83]: get_ipython().run_line_magic('timeit', 'torch_pipe(fname)') # In[84]: torch_pil_tfms = T.Compose([T.Resize([256, 192]), T.ToTensor()]) def pil_pipe(fname): torch_pil_tfms(load_image(fname)) # In[85]: get_ipython().run_line_magic('timeit', 'pil_pipe(fname)') # > Note: I am using [pillow-simd](https://github.com/uploadcare/pillow-simd) with AVX enabled. # ## [Beta] Torchvision 0.10 # > This issue has been partialy solved in the latest release of torchvision # In[66]: from fastcore.all import * from PIL import Image from fastai.vision.core import show_images, image2tensor import torch, torchvision import torchvision.transforms as T import torchvision.transforms.functional as F from torchvision.io.image import read_image torch.__version__, torchvision.__version__ # In[3]: import urllib url = "https://user-images.githubusercontent.com/3275025/123925242-4c795b00-d9bd-11eb-9f0c-3c09a5204190.jpg" img = Image.open( urllib.request.urlopen(url) ) # let's use this image that comes from the [issue](https://github.com/pytorch/vision/issues/2950) on github, it really shows the problem with the non antialiased method on the grey concrete. # In[17]: small_size = fastuple(img.shape)//4 img.shape, small_size # In[37]: resized_pil_image = img.resize(small_size[::-1]) resize_non_anti_alias = T.Resize(small_size, interpolation=Image.BILINEAR) resize_antialias = T.Resize(small_size, interpolation=Image.BILINEAR, antialias=True) #this is new in torchvsion 0.10 # In[72]: timg = T.ToTensor()(img) # timg = image2tensor(img) # you can use fastai `image2tensor` to get non scaled tensors # remember that `T.ToTensor` here also scales the images by 255. to get values in `[0,1]` # In[73]: timg.min(), timg.max() # In[74]: timg_naa, timg_aa = resize_non_anti_alias(timg), resize_antialias(timg) show_images([resized_pil_image, timg_naa, timg_aa], titles=['pil resized', 'tensor non antialiased', 'tensor with antialiased'], figsize=(24,12)) # let's compare the pil vs the tensor antialiased resize: # In[118]: tensor_pil_image_resized = T.ToTensor()(resized_pil_image) difference = (255*(tensor_pil_image_resized - timg_aa).abs()) show_images([tensor_pil_image_resized[:,150:200,150:200], timg_aa[:,150:200,150:200], difference[:,150:200,150:200]], titles=['pil resized', 'tensor resized antialiased', 'difference'], figsize=(24,12)) # way better than before. # In[119]: [f(difference) for f in [torch.max, torch.min, torch.median]] # ## Conclusions # # Ideally, deploy the model with the exact same transforms as it was validated. Or at least, check that the performance does not degrade. # I would like to see more consistency between both API in pure pytorch, as the user is pushed to use the new `pillow-free` pipeline, but results are not consistent. Resize is a fundamental part of the image preprocessing in most user cases. # # - There is an issue [open](https://github.com/pytorch/vision/issues/2950) on the torchvision github about this. # - Also one about the difference between PIL and openCV [here](https://github.com/python-pillow/Pillow/issues/2718) # - Pillow appears to be faster and can open a larger variety of image formats. # # ~~This was pretty frustrating, as it was not obvious where the model was failing.~~ # # >Important: It appears that torchvsion 0.10 has solved this issue! This feature is still in beta, and probably the default arg should be `antialias=True`.