[reportlab-users] Building a PDF file with Images and "OCR" searchable text

Glenn Linderman v+python at g.nevcal.com
Sun Sep 23 02:51:40 EDT 2012


Acrobat has a feature to take scanned images, OCR them, and place the
OCR'd text at the same location and approximate font face & size as the
text in the scanned image, and allow searching on it.

I have an application where I have the images (computer generated, but
as full page graphics), and I have the text (what was fed in to the
program that generated the full page graphics). I would like to make a
PDF file of the same nature as what Acrobat does, with some relaxed
constraints... I don't care about the placement or font face or size of
the text, except that it be on the same page, I just want users to be
able to find the right page by searching for the text.

Is this possible in reportlab?

I wouldn't know where to start: is the OCR'd text some special feature,
or a special layer, or a hidden layer?

I thought about making alternating pages, text, image, text, image, so
the search could be done, and then simply go to the next page to find
the image, but that is visually disruptive to the user, and the text is
all on the image page anyway, so it is also visually redundant. Using
text sort of like the OCR feature would produce would be more elegant,
if I could get some tips on how to get started.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://two.pairlist.net/pipermail/reportlab-users/attachments/20120922/3c9d568d/attachment.html>


More information about the reportlab-users mailing list