[reportlab-users] Reportlab text not searchable in Apple OSX Preview.App? but searchable in Acrobat and google-pdf viewer ?
Matt Folwell
mjf at pearson.co.uk
Wed Aug 5 13:31:48 EDT 2009
Robin Becker wrote:
> Tim Roberts wrote:
>> Robin Becker wrote:
>>> ........
>>> It seems that preview is actually looking at the rendered image to
>>> find the characters.
>>
>> I agree that this matches the symptoms, but as a programmer, how would
>> you do that? The rendered image is just a matrix of pixels. How would
>> you search for words, or even letters, for that matter?
>>
>> When I encounter a bug, I always like to put myself in the mind of the
>> programmer to figure out what thinking would have led to the bug. I'm
>> having a hard time coming up with an implementation that would trigger
>> this. Maybe they are converting the PDF to some kind of intermediate
>> language (like a Windows EMF), where strings that aren't horizontal get
>> converted into a series of smaller strings that ARE horizontal, and they
>> are searching that intermediate format. I'd call that "overthinking the
>> problem".
>>
> I saw some related Preview search problems on one of the tex lists when
> I was googling. They seemed to be recommending the use of standard fonts
> and the like to improve searchablity; that would come down to some kind
> of OCR like weakness. Mac people: are gifs searchable?
I tried a JPEG, and everything on the find menu was greyed out.
I made a PDF with the following script, and copy and pasted all the text
from Preview into TextEdit.
import reportlab.pdfgen.canvas as canvas
c = canvas.Canvas("twolines.pdf")
c.setFont("Courier", 10)
c.rotate(90)
c.drawString(0, -10, "ABCDEFG")
c.drawString(0, -20, "1234567")
c.rotate(-90)
c.showPage()
c.save()
The copy-pasted text I got back was:
AB
CD
E
FG
12
34
5
67
Which coincides quite nicely with Tim's suggestion that it's breaking
them into shorter strings.
--
Matt Folwell
More information about the reportlab-users
mailing list