[reportlab-users] pageCatcher doesn't read PDFs created by PIL

Thu, 20 Nov 2003 21:52:11 -0600

Hi,

I'm writing a document management system in Python.  It scans pages,
makes PDFs of them with PIL, writes the pages to a database, and later
constructs aggregate PDFs on demand from a Web interface, using
ReportLab.

I've had no trouble combining various PDF files using pageCatcher,
except when I try to include PIL-generated PDFs.  My guess is that PIL
is doing something wrong in generating the PDFs, but I'm in the
unfortunate position of having several hundred images in the database
that I can't get out now.  I can't even convert the pages back to
another format, because PIL can't read PDF, it can only write it.

Here's a very simple test case:

from reportlab.pdfgen import canvas
from rlextra.pageCatcher.pageCatcher import copyPages
import Image

Image.new('RGB', (100,100)).save('d:\\pil.pdf')
cvs = canvas.Canvas('d:\\output.pdf')
copyPages('d:\\pil.pdf', cvs)
cvs.save()

This gives me:

D:\>python22\python copy.py
Traceback (most recent call last):
  File "copy.py", line 7, in ?
    copyPages('d:\\pil.pdf', cvs)
  File "_p:rlextra\pageCatcher\pageCatcher.py", line 1242, in copyPages
  File "_p:rlextra\pageCatcher\pageCatcher.py", line 678, in parse
  File "_p:rlextra\pageCatcher\pageCatcher.py", line 784, in getindirectObject
  File "_p:rlextra\pageCatcher\pageCatcher.py", line 822, in gettrue
ValueError: `endobj` keyword not found 1290 '5 0 obj\n<<\n/Length 3'

I'm using ReportLab 1.18, RLExtra 1.17, Python 2.2.3 and PIL 1.1.4 on
Windows 2000 Server.

I am not necessarily asking for pageCatcher to be fixed, since I the
problem may be in PIL, but I do need to figure out how to fix my PDF
files, and get PIL to generate valid ones in the future.  Acrobat
Reader does not appear to choke on the PIL-generated files, but
Ghostscript does (which was my backup solution for page merging...)

Help would be gratefully appreciated.

Thanks,

-- 
=Nicholas Riley <njriley@uiuc.edu> | <http://www.uiuc.edu/ph/www/njriley>