[reportlab-users] Reportlab text not searchable in Apple OSX Preview.App? but searchable in Acrobat and google-pdf viewer ?

Matt Folwell mjf at pearson.co.uk
Wed Aug 5 13:31:48 EDT 2009


Robin Becker wrote:

> Tim Roberts wrote:

>> Robin Becker wrote:

>>> ........

>>> It seems that preview is actually looking at the rendered image to

>>> find the characters.

>>

>> I agree that this matches the symptoms, but as a programmer, how would

>> you do that? The rendered image is just a matrix of pixels. How would

>> you search for words, or even letters, for that matter?

>>

>> When I encounter a bug, I always like to put myself in the mind of the

>> programmer to figure out what thinking would have led to the bug. I'm

>> having a hard time coming up with an implementation that would trigger

>> this. Maybe they are converting the PDF to some kind of intermediate

>> language (like a Windows EMF), where strings that aren't horizontal get

>> converted into a series of smaller strings that ARE horizontal, and they

>> are searching that intermediate format. I'd call that "overthinking the

>> problem".

>>

> I saw some related Preview search problems on one of the tex lists when

> I was googling. They seemed to be recommending the use of standard fonts

> and the like to improve searchablity; that would come down to some kind

> of OCR like weakness. Mac people: are gifs searchable?



I tried a JPEG, and everything on the find menu was greyed out.

I made a PDF with the following script, and copy and pasted all the text
from Preview into TextEdit.

import reportlab.pdfgen.canvas as canvas
c = canvas.Canvas("twolines.pdf")
c.setFont("Courier", 10)
c.rotate(90)
c.drawString(0, -10, "ABCDEFG")
c.drawString(0, -20, "1234567")
c.rotate(-90)
c.showPage()
c.save()


The copy-pasted text I got back was:

AB
CD
E
FG
12
34
5
67

Which coincides quite nicely with Tim's suggestion that it's breaking
them into shorter strings.

--
Matt Folwell


More information about the reportlab-users mailing list