[reportlab-users] Writing smaller image-only PDFs

Wed Feb 8 23:09:22 EST 2006

Hi,

I am trying to scan a number of pages, then write the output to a PDF 
using reportlab. The size of the PDFs generated is much larger than 
would seem necessary, but I'm not sure why. I've tried to reduce the 
file size, but it doesn't seem to work.

I am using the Python TWAIN module to scan the images, which passes the 
images in BMP format. I use the Python Image Library to open the BMP, 
and write the PIL object to the PDF using drawInlineImage(). I tried 
changing to using drawImage(), which required me to wrap the image in an 
ImageReader object, but this did not decrease the output PDF file size.

The produced PDF file size was approx 2MB.

I tried to reduce this by using JPEGs. So I saved my BMPs into JPEGs 
(using an StringIO class), then reopened the JPEGs using the PIL, and 
wrote the PIL object to the PDF using drawInlineImage().

The resulting PDF file size was 7.8MB.

When I turned on page compression, the file size was reduced to 6.8MB.

The PDF I am generating only has 11 pages (11 images).

When I try doing the same thing using a commercial tool (Omnipage), to 
do the scanning and production of the image-only PDF, the resultant file 
size is 0.4MB.

While I realise that an open source tool may not be able to achieve the 
same reduction level as a commercial tool, the file sizes I am getting 
using Python seem too large. Particularly as I am getting larger output 
for JPEGs than I am for BMPs.

Does anyone know how I can reduce the file size of my produced PDFs? I 
suspect I may be doing something wrong with the JPEGs, but not really sure.

Nick