[reportlab-users] pageCatcher doesn't read PDFs created by PIL
Thu, 20 Nov 2003 21:52:11 -0600
I'm writing a document management system in Python. It scans pages,
makes PDFs of them with PIL, writes the pages to a database, and later
constructs aggregate PDFs on demand from a Web interface, using
I've had no trouble combining various PDF files using pageCatcher,
except when I try to include PIL-generated PDFs. My guess is that PIL
is doing something wrong in generating the PDFs, but I'm in the
unfortunate position of having several hundred images in the database
that I can't get out now. I can't even convert the pages back to
another format, because PIL can't read PDF, it can only write it.
Here's a very simple test case:
from reportlab.pdfgen import canvas
from rlextra.pageCatcher.pageCatcher import copyPages
cvs = canvas.Canvas('d:\\output.pdf')
This gives me:
Traceback (most recent call last):
File "copy.py", line 7, in ?
File "_p:rlextra\pageCatcher\pageCatcher.py", line 1242, in copyPages
File "_p:rlextra\pageCatcher\pageCatcher.py", line 678, in parse
File "_p:rlextra\pageCatcher\pageCatcher.py", line 784, in getindirectObject
File "_p:rlextra\pageCatcher\pageCatcher.py", line 822, in gettrue
ValueError: `endobj` keyword not found 1290 '5 0 obj\n<<\n/Length 3'
I'm using ReportLab 1.18, RLExtra 1.17, Python 2.2.3 and PIL 1.1.4 on
Windows 2000 Server.
I am not necessarily asking for pageCatcher to be fixed, since I the
problem may be in PIL, but I do need to figure out how to fix my PDF
files, and get PIL to generate valid ones in the future. Acrobat
Reader does not appear to choke on the PIL-generated files, but
Ghostscript does (which was my backup solution for page merging...)
Help would be gratefully appreciated.
=Nicholas Riley <email@example.com> | <http://www.uiuc.edu/ph/www/njriley>