[reportlab-users] Writing smaller image-only PDFs

Robin Becker robin at reportlab.com
Thu Feb 9 05:11:58 EST 2006

Nicholas Watmough wrote:
> Each JPEG is about 0.5MB, so the combined size would be about 5.5MB.
> However, I tried saving the JPEGs individually through the commercial 
> tool (Omnipage), and the JPEGs were the same size as when saved through 
> Python. But the imge-only PDF produced by Omnipage was 0.4MB, and the 
> one produced through reportlab was 7.8MB.
> Maybe there is some way to reduce the JPEG file size?
> Nick

Could it be your docs are only black/white? A clever tool might recognize that 
and do the appropriate image manipulation. I'm fairly sure we try to respect the 
image properties ie check for gray/rgb/cmyk so we don't.

Since jpeg is native for pdf we use only ascii85 encoding to make the contents 
more like ascii.  I think we could save a bit by not doing that, but not a huge 
amount. Jpegs are already compressed and we have to specify dctdecode as well in 
the image filters.

Perhaps they're tweaking the jpeg parameters to allow something smaller.

Alternatively a smart scanner tool could actually do OCR, but I suspect they 
don't unless you ask for it.

Have you tried extracting the images from the omnipage output to see how they 
compare with the inputs?
Robin Becker

More information about the reportlab-users mailing list