%load_ext autoreload
%autoreload 2
from fusus.book import Book
B = Book(cd="~/github/among/fusus/example")
# cd to the book directory
!cd `pwd`
We show the line division in every block of text in every page. Check visually whether all lines have been detected correctly.
The histogram
stage shows the blocks that have been detected on the page,
and within the blocks the histograms that correspond to the ink distribution.
We mark the start and end of lines by orange and purple dots, which are obtained by a rolling median filter over the first and last black pixel position on each pixel line.
In each block the main line bands are shown.
A green rule marks the start of a band, a red rule the end.
The space between bands is greyed out.
We show the main
bands, which are derived directly from the histogram.
The main bands may not contain all the ink, but do not worry: the bands are used to target the cleaning of marks, and are not visible to the rest of the processing stages.
Check in particular:
Page 101 is a critical page: both errors are likely to occur!
def checkLines(pg, quiet=True, **kwargs):
if pg is None:
for pg in B.allPagesList:
page = B.process(
batch=False,
pages=pg,
doOcr=False,
uptoLayout=True,
quiet=quiet,
**kwargs,
)
page.show(stage="histogram")
else:
page = B.process(
batch=False,
pages=pg,
doOcr=False,
uptoLayout=True,
quiet=quiet,
**kwargs,
)
page.show(stage="histogram")
return page
# B.configure(blurY=None, peakSignificant=0.1, peakProminenceY=None, valleyProminenceY=None, debug=0)
page = checkLines(101)
0.00s Batch of 1 pages: 101 0.00s Start batch processing images | | | -0.00s 1 101.jpg 1.27s all done
page = checkLines(None)
0.00s Batch of 1 pages: 47 0.00s Start batch processing images | | | 5.10s 1 047.tif 1.35s all done
0.00s Batch of 1 pages: 48 0.00s Start batch processing images | | | 6.46s 1 048.tif 1.28s all done
0.00s Batch of 1 pages: 58 0.00s Start batch processing images | | | 7.77s 1 058.tif 1.21s all done
0.00s Batch of 1 pages: 59 0.00s Start batch processing images | | | 9.14s 1 059.tif 1.28s all done
0.00s Batch of 1 pages: 63 0.00s Start batch processing images | | | 11s 1 063.tif 1.28s all done
0.00s Batch of 1 pages: 67 0.00s Start batch processing images | | | 12s 1 067.tif 1.29s all done
0.00s Batch of 1 pages: 101 0.00s Start batch processing images | | | 13s 1 101.jpg 1.16s all done
0.00s Batch of 1 pages: 102 0.00s Start batch processing images | | | 15s 1 102.jpg 1.30s all done
0.00s Batch of 1 pages: 111 0.00s Start batch processing images | | | 17s 1 111.jpg 2.27s all done
0.00s Batch of 1 pages: 112 0.00s Start batch processing images | | | 19s 1 112.jpg 2.17s all done
0.00s Batch of 1 pages: 113 0.00s Start batch processing images | | | 22s 1 113.jpg 2.22s all done
0.00s Batch of 1 pages: 121 0.00s Start batch processing images | | | 23s 1 121.jpg 1.65s all done
0.00s Batch of 1 pages: 122 0.00s Start batch processing images | | | 25s 1 122.jpg 1.83s all done
0.00s Batch of 1 pages: 131 0.00s Start batch processing images | | | 28s 1 131.jpg 2.62s all done
0.00s Batch of 1 pages: 132 0.00s Start batch processing images | | | 30s 1 132.jpg 2.09s all done
0.00s Batch of 1 pages: 200 0.00s Start batch processing images | | | 36s 1 200.tif 5.21s all done
0.00s Batch of 1 pages: 300 0.00s Start batch processing images | | | 40s 1 300.tif 3.89s all done
0.00s Batch of 1 pages: 400 0.00s Start batch processing images | | | 45s 1 400.tif 4.54s all done