Here I'll describe a method of viewing baselines in Tesseract's interactive debug environment.
So now that you've completed the step 5 from the former tutorial and the debug window has appeared, do the following:
- In the main menu choose Modes->Show BL Norm Word. No apparent reaction from the UI should follow. This is normal.
- Now click on any word you're interested in. A new window titled BlnWords should appear.
- At first sight the BlnWords window is empty. But in fact this is not true. Nothing is visible only because of the quirky scaling logic used by ScrollView. To find something inside the window you need to use window scrollbars to pan and mouse scroll wheel to scale up/down. I suggest the following sequence for initial setting of the view:
- slowly drag down vertical scrollbar thumb until you see baselines and/or outlines,
- move horizontal scrollbar thumb approximately to the center,
- use mouse wheel to scale the window contents properly,
- you may also resize the window to your taste.
Baseline finding greatly influences character classification. Various baseline-relative positions of the same character can lead to completely different recognition results. That's why incorrect baselines often serve as sources of errors in Tesseract recognition.
A few examples. Let's take the "conventional" phototest.tif file: