This guide will give a quick intro to training PyTorch models with HugsVision. We'll start by loading in some data and defining a model, then we'll train it for a few epochs and see how well it does.
Note: The easiest way to use this tutorial is as a colab notebook, which allows you to dive in with no setup. We recommend you enable a free GPU with
Runtime → Change runtime type → Hardware Accelerator: GPU
Note: You need to have at least Python 3.6 to run the scripts.
First we install HugsVision if needed.
try:
import hugsvision
except:
!pip install -q hugsvision
import hugsvision
print(hugsvision.__version__)
import os
import os.path
from shutil import copyfile
from tqdm import tqdm
import pandas as pd
df = pd.read_csv("./train_data.csv")
img_in = "./small_train_data_set/small_train_data_set/"
img_out = "./data/"
for index, row in tqdm(df.iterrows()) :
label = "pneumothorax" if row['target'] == 1 else "normal"
path_in = img_in + row['file_name']
path_out = img_out + label + "/" + row['file_name']
# Check if the input image exist
if not os.path.isfile(path_in):
continue
# Check if the output dir of the label exist
if not os.path.isdir(img_out + label):
os.mkdir(img_out + label)
print("Directory for the label " + label + " created!")
# Copy the image
copyfile(path_in, path_out)
Once it has been converted, we can start loading the data.
from hugsvision.dataio.VisionDataset import VisionDataset
train, test, id2label, label2id = VisionDataset.fromImageFolder(
"./data/",
test_ratio = 0.15,
balanced = True,
augmentation = True,
)
Now we can choose our base model on which we will perform a fine-tuning to make it fit our needs.
Our choices aren't very large since we haven't a lot of model available yet on HuggingFace for this task.
So, to be sure that the model will be compatible with HugsVision
we need to have a model exported in PyTorch
and compatible with the image-classification
task obviously.
Models available with this criterias: here
At the time I'am writing this, I recommand to use the following models:
google/vit-base-patch16-224-in21k
google/vit-base-patch16-224
facebook/deit-base-distilled-patch16-224
microsoft/beit-base-patch16-224
Note: Please specify ignore_mismatched_sizes=True
for both model
and feature_extractor
if you aren't using the following model.
huggingface_model = 'google/vit-base-patch16-224-in21k'
So, once the model choosen, we can start building the Trainer
and start the fine-tuning.
Note: Import the FeatureExtractor
and ForImageClassification
according to your previous choice.
from transformers import ViTFeatureExtractor, ViTForImageClassification
from hugsvision.nnet.VisionClassifierTrainer import VisionClassifierTrainer
trainer = VisionClassifierTrainer(
model_name = "MyPneumoModel",
train = train,
test = test,
output_dir = "./out/",
max_epochs = 1,
batch_size = 32, # On RTX 2080 Ti
model = ViTForImageClassification.from_pretrained(
huggingface_model,
num_labels = len(label2id),
label2id = label2id,
id2label = id2label
),
feature_extractor = ViTFeatureExtractor.from_pretrained(
huggingface_model,
),
)
Using the F1-Score metrics will allow us to get a better representation of predictions for all the labels and find out if their are any anomalies wit ha specific label.
ref, hyp = trainer.evaluate_f1_score()
precision recall f1-score support
normal 0.76 0.24 0.36 191
pneumothorax 0.41 0.88 0.56 114
accuracy 0.48 305
macro avg 0.58 0.56 0.46 305
weighted avg 0.63 0.48 0.43 305
Rename the ./out/MODEL_PATH/config.json
file present in the model output to ./out/MODEL_PATH/preprocessor_config.json
import os.path
from transformers import ViTFeatureExtractor, ViTForImageClassification
from hugsvision.inference.VisionClassifierInference import VisionClassifierInference
path = "./out/MyPneumoModel/20_2021-08-20-01-46-44/model/"
img = "../../../samples/pneumothorax/with.jpg"
classifier = VisionClassifierInference(
feature_extractor = ViTFeatureExtractor.from_pretrained(path),
model = ViTForImageClassification.from_pretrained(path),
)
label = classifier.predict(img_path=img)
print("Predicted class:", label)