Notebook

Scanned Text Recognition¶

functionality (mostly done)
- resolution detection, deep language modeling
- font identification, reading order detection
- upsampling, downsampling, better noise removal
- character and word bounding boxes
training / data
- larger, more diverse training sets
- large scale self-supervised training (research)

Scanned Text Recognition (more)¶

other work
- purely convolutional OCR (better suited to current accelerators)
- replace line normalization, layout extraction with deep models
- replace data augmentation with deep models (like GAN)
- better semantic segmentation (text, image, table, graph, figure, ...)
- non-CTC models and/or automatic decoding
- benchmarking of attention-based models

Camera Captured Recognition¶

functionality
- page boundary detection models
- DL dewarping
- DL depth estimation (from RGB, from stereo)
training / data
- large collection of photographically captured images
- automatic generation of photographically distorted images (ray tracing)
- automatic DL-based data augmentation

Scene Text Recognition¶

functionality
- DL text detection / extraction (reimplement standard convolutional models)
training / data
- good datasets exist; train on them
- benchmark CTC vs attention-based models

In [ ]: