[reportlab-users] Building a PDF file with Images and "OCR" searchable text

Andy Robinson andy at reportlab.com
Sun Sep 23 09:45:27 EDT 2012


On 23 September 2012 07:51, Glenn Linderman <v+python at g.nevcal.com> wrote:

>

> Is this possible in reportlab?


I honestly don't know how Acrobat does it. There is a recent feature
we have not implemented to add xml text versions of a document
somewhere inside the PDF file for easy indexing.

However, there's an easy enough trick. You just need to draw that
text on the same page as the image with some combination of (a) no
fill colour for the text ('white on white'), (b) in small text or
even behind the image, or (c) off the edge of the readable page. Then
all the normal text search tools should find it.

Are you drawing in a flowing, Platypus mode, or using pdfgen to
manually place images and control page breaks? If the latter, then
you could construct a paragraph object, put all the text in it, pick a
pretty small font size so it's just about certain to be smaller than
the scanned image, and position it on the page then draw your
page-image over the top. (Call 'wrap' and 'draw' manually). If you
are doing it in Platypus it's a bit fiddlier but we can probably show
you a code snippet to do it.

Please let us know if this works and especially how it shows up in
Acrobat Reader ;-)

--
Andy


More information about the reportlab-users mailing list