I am using scanned PDFs to convert to plain text, altogether it is highly effective, but I have an article There are problems in the scan order. The document with tabular data seems to be scanning the column by column when it appears that a more natural way to scan the line by line would be an example of a very small scale:
This column A, line 1 is column B, line 1 is column C, line 1 is column A, line 2 is column B, row 2 is column C, line 2
Is:
this column A, line 1 Column A, line 2 is column B, row 1 is column B, row 2 this column is C, line 1 this column C, line 2
I am starting to read the documentation I am guessing and examining, with the attitude of cruel force, but if someone has already solved a problem, then I appreciate the insights on fixes. It may be some training data but I do not know how it works.
You can play with its various
No comments:
Post a Comment