[reportlab-users] Writing smaller image-only PDFs
robin at reportlab.com
Thu Feb 9 05:11:58 EST 2006
Nicholas Watmough wrote:
> Each JPEG is about 0.5MB, so the combined size would be about 5.5MB.
> However, I tried saving the JPEGs individually through the commercial
> tool (Omnipage), and the JPEGs were the same size as when saved through
> Python. But the imge-only PDF produced by Omnipage was 0.4MB, and the
> one produced through reportlab was 7.8MB.
> Maybe there is some way to reduce the JPEG file size?
Could it be your docs are only black/white? A clever tool might recognize that
and do the appropriate image manipulation. I'm fairly sure we try to respect the
image properties ie check for gray/rgb/cmyk so we don't.
Since jpeg is native for pdf we use only ascii85 encoding to make the contents
more like ascii. I think we could save a bit by not doing that, but not a huge
amount. Jpegs are already compressed and we have to specify dctdecode as well in
the image filters.
Perhaps they're tweaking the jpeg parameters to allow something smaller.
Alternatively a smart scanner tool could actually do OCR, but I suspect they
don't unless you ask for it.
Have you tried extracting the images from the omnipage output to see how they
compare with the inputs?
More information about the reportlab-users