[reportlab-users] Writing smaller image-only PDFs

Thu Feb 9 05:11:58 EST 2006

Nicholas Watmough wrote:
> Each JPEG is about 0.5MB, so the combined size would be about 5.5MB.
> 
> However, I tried saving the JPEGs individually through the commercial 
> tool (Omnipage), and the JPEGs were the same size as when saved through 
> Python. But the imge-only PDF produced by Omnipage was 0.4MB, and the 
> one produced through reportlab was 7.8MB.
> 
> Maybe there is some way to reduce the JPEG file size?
> 
> Nick
> 
>

Could it be your docs are only black/white? A clever tool might recognize that 
and do the appropriate image manipulation. I'm fairly sure we try to respect the 
image properties ie check for gray/rgb/cmyk so we don't.

Since jpeg is native for pdf we use only ascii85 encoding to make the contents 
more like ascii.  I think we could save a bit by not doing that, but not a huge 
amount. Jpegs are already compressed and we have to specify dctdecode as well in 
the image filters.

Perhaps they're tweaking the jpeg parameters to allow something smaller.

Alternatively a smart scanner tool could actually do OCR, but I suspect they 
don't unless you ask for it.

Have you tried extracting the images from the omnipage output to see how they 
compare with the inputs?
-- 
Robin Becker