This notebook is a demonstration of the usage of Intel's OpenVino horizontal text detections model for the horizontal text detection on Total Text Dataset with the help of the open source tool FiftyOne. The notebook goes through the steps of loading the dataset of images with groundtruth detections into FiftyOne, visualizing the images and adding predictions from the model and evaluating those predictions against the ground truth
where you have the notebook
!pip install fiftyone cv2 openvino numpy wget
import fiftyone as fo
import os
import glob
import cv2
import re
import numpy as np
import wget
from openvino.runtime import Core
To add samples to the dataset, we are going to loop through the images in the Train and Test folders and use the ground_truth_polygonal annotations text files to get the ground truth bounding box detections and labels. There are other groundtruths available in the Annotation folder such as character level and text region mask which user can use based on their model evaluation. For the purpose of this notebook, we are looking at the horizontal bounding box text detection.
# Create samples for your data
samples = []
for dataname in ['Train', 'Test']:
# Looping through the Train and Test folder paths to add samples to the dataset
images_patt = "./Total-Text/"+dataname+"/*"
for filepath in glob.glob(images_patt):
# Creating image samples with their respective sample tags, Train or Test
# Tags gives you flexibility to use only samples present in Train or Test data
sample = fo.Sample(filepath=filepath,tags=[dataname])
# Get height, width of image
img = cv2.imread(filepath, cv2.IMREAD_UNCHANGED)
height = img.shape[0]
width = img.shape[1]
# Getting the filename from the filepath using the split operation
# Ex: img1001 from Train folder image file path './Total-Text/Test\\img1.jpg'
# Check the separator for the first split based on the OS
filename=filepath.split("\\")[-1].split(".")[0]
# List of test images that are avoided due to incorrect formatting of their polygonal annnotations .txt file
# The correct polygonal annotations format should be
# "x: [[153 161 179 195 184 177]], y: [[347 323 305 315 331 357]], ornt: [u'c'], transcriptions: [u'the']\n"
test_images_to_avoid=['img551','img621','img623']
if filename not in test_images_to_avoid:
# Path to polygonal annotation text file
annotation_path="./Total-Text/Annotation/groundtruth_polygonal_annotation/"+dataname+"/poly_gt_"+filename+".txt"
with open(annotation_path, "r") as f:
# Each polygonal annotation text file is read line by line
# For each line we try to extract key-value pairs using regular expressions python library
# x and y are coordinates for bounding boxes, ornt is the orientation of the text
# transcription gives us the text value detected
polylines = []
lines = f.readlines()
for line in lines:
# Using the findall function of the re library we extract the values of
# x, y, ornt, and transcriptions by pattern matching
# For example, for x and y we are looking for number using \d
# for ornt and transcription we are matching with alphabets both
# small and Capital as well as '#' in some cases
x = re.findall(r'\d+\.\d+|\d+', line.split(',')[0])
y = re.findall(r'\d+\.\d+|\d+', line.split(',')[1])
# In case of ornt and transcription, we have an extra check where there are no
# values or its empty
if(len(re.findall(r'[a-z]+|\#', line.split(',')[2])))==3:
ornt = re.findall(r'[a-z]+|\#', line.split(',')[2])[2]
else:
ornt = "no_value"
if(len(re.findall(r'[A-Za-z]+|\#|\d+', line.split(',')[3])))==3:
transcriptions = re.findall(r'[A-Za-z]+|\#|\d+', line.split(',')[3])[2]
else:
transcriptions = "no_label"
# normalize x and y values between 0 and 1 using the image height and width
x = [round(float(i)/width, 2) for i in x]
y = [round(float(i)/height, 2) for i in y]
# get in the format of lists of lists of tuples
points = [list(zip(x, y))]
# Create polyline label
polyline=fo.Polyline(points=points,closed=True)
# In case of evaluating model that detects text label, you can modify the above code
# by adding label parameter to the fo.PolyLine method as label=transcriptions.
# The other way around in the absence of labels for both groundtruths and model
# predictions is to use an arbitrary label such as label='detected'
polylines.append(polyline)
# Adding polyline labels for each samples
sample["ground_truth_polylines"] = fo.Polylines(polylines=polylines)
# Adding groundtruth labels with bounding box representation
# tighly enclosing the polylines
sample["ground_truth"] = sample["ground_truth_polylines"].to_detections()
samples.append(sample)
# Create dataset
dataset = fo.Dataset("Total-Text-dataset-FO-1")
dataset.add_samples(samples)
session = fo.launch_app(dataset=dataset)
# Print some information about the dataset
print(dataset)
Name: Total-Text-dataset-FO-1 Media type: image Num samples: 1552 Persistent: False Tags: [] Sample fields: id: fiftyone.core.fields.ObjectIdField filepath: fiftyone.core.fields.StringField tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField) metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata) ground_truth_polylines: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Polylines) ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
# Print a ground truth detection
sample = dataset.first()
print(sample.ground_truth.detections[0])
<Detection: { 'id': '646e4caddaa7b9101742894a', 'attributes': {}, 'tags': [], 'label': None, 'bounding_box': [0.3, 0.61, 0.09000000000000002, 0.09999999999999998], 'mask': None, 'confidence': None, 'index': None, }>
# predictions_view = dataset.match_tags(["Train", "Test"]) # for whole dataset
predictions_view = dataset.match_tags(["Test"]) # for test dataset
An OpenVINO IR (Intermediate Representation) model consists of an .xml file, containing information about network topology, and a .bin file, containing the weights and biases binary data. The read_model() function expects the .bin weights file to have the same filename and be located in the same directory as the .xml file.
# Get the current working directory
cwd = os.path.abspath(os.getcwd())
# Create a new directory named 'model'
os.mkdir(cwd+'/model')
# Define the source urls
model_xml_url = 'https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/main/notebooks/004-hello-detection/model/horizontal-text-detection-0001.xml'
model_bin_url = 'https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/004-hello-detection/model/horizontal-text-detection-0001.bin?raw=true'
# Define the destination file paths
xml_path = cwd+'/model/horizontal-text-detection-0001.xml'
bin_path = cwd+'/model/horizontal-text-detection-0001.bin'
# Download the files to their respective paths
wget.download(model_xml_url, out = xml_path)
wget.download(model_bin_url, out = bin_path)
ie = Core()
model = ie.read_model(model="./model/horizontal-text-detection-0001.xml")
compiled_model = ie.compile_model(model=model, device_name="CPU")
input_layer_ir = compiled_model.input(0)
output_layer_ir = compiled_model.output("boxes")
with fo.ProgressBar() as pb:
for sample in pb(predictions_view):
# Text detection models expect an image in BGR format.
img = cv2.imread(sample.filepath)
# height, width of image
height = img.shape[0]
width = img.shape[1]
# N,C,H,W = batch size, number of channels, height, width.
N, C, H, W = input_layer_ir.shape
# Resize the image to meet network expected input sizes.
resized_image = cv2.resize(img, (W, H))
# Reshape to the network input shape.
input_image = np.expand_dims(resized_image.transpose(2, 0, 1), 0)
# Create an inference request.
boxes = compiled_model([input_image])[output_layer_ir]
# Remove zero only boxes.
boxes = boxes[~np.all(boxes == 0, axis=1)]
# Getting the ratio of resized images and original image to avoid getting bounding boxes at wrong location
(real_y, real_x), (resized_y, resized_x) = img.shape[:2], resized_image.shape[:2]
ratio_x, ratio_y = real_x / resized_x, real_y / resized_y
# Convert detections to FiftyOne format
detections = []
for i in range(len(boxes)):
# Convert float to int and multiply corner position of each box by x and y ratio.
# If the bounding box is found at the top of the image,
# position the upper box bar little lower to make it visible on the image.
(x1, y1, x2, y2) = [
int(max(corner_position * ratio_y, 10)) if idx % 2
else int(corner_position * ratio_x)
for idx, corner_position in enumerate(boxes[i][:-1])
]
# Convert to [top-left-x, top-left-y, width, height]
# in relative coordinates in [0, 1] x [0, 1]
rel_box = [x1 / width, y1 / height, (x2 - x1) / width, (y2 - y1) / height]
detections.append(
fo.Detection(
bounding_box=rel_box,
confidence=boxes[i][4]
)
)
sample["horizontal_detection"] = fo.Detections(detections=detections)
sample.save()
100% |█████████████████| 297/297 [19.2s elapsed, 0s remaining, 15.9 samples/s]
session.view = predictions_view
session.show()
results = predictions_view.evaluate_detections(
"horizontal_detection", gt_field="ground_truth",use_boxes=True, classwise=False, eval_key="eval"
)
Evaluating detections... 100% |█████████████████| 297/297 [9.0s elapsed, 0s remaining, 39.4 samples/s]
The OpenVino's horizontal detection model only detects bounding box but return label for the text detected, therefore, while evaluating detection using evaluate_detection
function, the classwise
parameter is set to False
and also ground_truth labels for text detected are not added to dataset. For a model that returns label for text detected, you can set classwise
to True
and add the ground truth labels to dataset.
# Convert to evaluation patches
eval_patches = dataset.to_evaluation_patches("eval")
print(eval_patches)
Dataset: Total-Text-dataset-FO-1 Media type: image Num patches: 13422 Patch fields: id: fiftyone.core.fields.ObjectIdField sample_id: fiftyone.core.fields.ObjectIdField filepath: fiftyone.core.fields.StringField tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField) metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata) ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections) horizontal_detection: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections) crowd: fiftyone.core.fields.BooleanField type: fiftyone.core.fields.StringField iou: fiftyone.core.fields.FloatField View stages: 1. ToEvaluationPatches(eval_key='eval', config=None)
print(eval_patches.count_values("type"))
{None: 10589, 'fp': 303, 'fn': 580, 'tp': 1950}
# View patches in the App
session.view = eval_patches