[reportlab-users] Reportlab text not searchable in Apple OSX Preview.App? but searchable in Acrobat and google-pdf viewer ?

hari jayaram harijay at gmail.com
Wed Aug 5 09:05:53 EDT 2009


I also tried the test code :import reportlab.pdfgen.canvas as canvas
c = canvas.Canvas("searchtext.pdf")
c.rotate(90)
textobj = c.beginText(10,-10)
textstring = "Dispense File Prefix: %s" % "hello world"
textobj.textLines(textstring)
c.drawText(textobj)
c.rotate(-90)
c.showPage()
c.save()

And that gave the attached pdf which fails search for any of the contained
text in Preview.App ( Picture 21 .png)

I dont know what I am doing wrong
Thanks for your help troubleshooting this. Hope the problem does not lie in
Preview.App because the same pdfs are searchable in google pdf ( gmail)
reader and in Acrobat .
Hari



On Wed, Aug 5, 2009 at 9:02 AM, hari jayaram <harijay at gmail.com> wrote:


> Hi Bill and Robin ,Thanks for your replies

> I tried the textobject way of writing out the string based on the code

> snippet you provided . However , the Preview search still does not work. I

> think I implemented what you suggested.

>

> The code snippet is shown here . The older version of my code was using

> canvas drawString methods to render the string . The full code is on the

> github link ( see below) . That too had the same effect of giving

> unsearchable text.

>

> self.canvas_obj.rotate(90)

> textobj = self.canvas_obj.beginText(10,-10)

> # textobj.setTextRenderMode(INVISIBLE_MODE)

> textstring = "Dispense File Prefix: %s" %

> str(os.path.splitext(self.filename)[0] )

> textstring = textstring.strip().encode('latin-1', 'replace')

> textobj.textLines(textstring)

> self.canvas_obj.drawText(textobj)

> # self.canvas_obj.drawString(10,-10,"DispenseFilePrefix: %s" %

> str(os.path.splitext(self.filename)[0] ))

> self.canvas_obj.rotate(-90)

>

> Robin, I am trying to search for the Text in the Preview.App search Bar .

> This search bar works for every "text" pdf document I have . However If you

> see the attached png image or the pdf document , Preview thinks each word

> has several spaces in it ..so though visibly the word is present .

> Semantically it just seems to be a sequence of alphabets .

>

> Attachments : github source for report with drawstring methods:

> http://github.com/harijay/protein-crystallization-gridmaker/blob/1ca03fd8aa85cd18b93ac63ff6447199d9799dcb/platepdfwriter.py

> Png Image showing search result : Picture 20.png, dispense_not_found.png

> pdf file failing search rendered with drawText code ( based on Bill Janssens

> suggestion): test2.pdf

>

> Hari

>

>

>

> On Tue, Aug 4, 2009 at 10:27 PM, Bill Janssen <janssen at parc.com> wrote:

>

>> hari jayaram <harijay at gmail.com> wrote:

>>

>> > I noticed however that the text laden pdfs I am rendering are not

>> searchable

>> > using Apple Mac (Leopard) OSX Preview.App

>> >

>> > When I use the built in search within Preview.App only single characters

>> > light up ( only single characters show matches like a , b , c , d ) No

>> words

>> > light up..

>>

>> Works fine for me, generating PDFs with ReportLab 2.2 and searching with

>> Preview.

>>

>> I add my text to the PDF a word at a time, with this code:

>>

>> textobj = mycanvas.beginText(word.left, word.baseline)

>> textobj.setTextRenderMode(INVISIBLE_MODE)

>> textstring = word.text.strip().encode('latin-1', 'replace')

>> textobj.textLines(textstring)

>> mycanvas.drawText(textobj)

>>

>> Incidentally, can I switch to UTF-8 these days?

>>

>> Bill

>> _______________________________________________

>> reportlab-users mailing list

>> reportlab-users at reportlab.com

>> http://two.pairlist.net/mailman/listinfo/reportlab-users

>>

>

>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://two.pairlist.net/pipermail/reportlab-users/attachments/20090805/eaa85a05/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: searchtext.pdf
Type: application/pdf
Size: 1918 bytes
Desc: not available
Url : <http://two.pairlist.net/pipermail/reportlab-users/attachments/20090805/eaa85a05/attachment-0001.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Picture 21.png
Type: image/png
Size: 37790 bytes
Desc: not available
Url : <http://two.pairlist.net/pipermail/reportlab-users/attachments/20090805/eaa85a05/attachment-0001.png>


More information about the reportlab-users mailing list