[reportlab-users] Writing smaller image-only PDFs

Thu Feb 9 06:37:07 EST 2006

Yes, the images do only contain black and white. Does this (or can this) 
assist compression?

The tool is not doing OCR - it is saving the PDF with images.

Extract the images from from the PDFs? Extract how?

And why is reportlab able to compress 11MB of BMPs into 2MB, but not 
able to compress 5.5MB of JPEGs?

Thanks.

Robin Becker wrote:
> Nicholas Watmough wrote:
>> Each JPEG is about 0.5MB, so the combined size would be about 5.5MB.
>>
>> However, I tried saving the JPEGs individually through the commercial 
>> tool (Omnipage), and the JPEGs were the same size as when saved 
>> through Python. But the imge-only PDF produced by Omnipage was 0.4MB, 
>> and the one produced through reportlab was 7.8MB.
>>
>> Maybe there is some way to reduce the JPEG file size?
>>
>> Nick
>>
>>
>
> Could it be your docs are only black/white? A clever tool might 
> recognize that and do the appropriate image manipulation. I'm fairly 
> sure we try to respect the image properties ie check for gray/rgb/cmyk 
> so we don't.
>
> Since jpeg is native for pdf we use only ascii85 encoding to make the 
> contents more like ascii.  I think we could save a bit by not doing 
> that, but not a huge amount. Jpegs are already compressed and we have 
> to specify dctdecode as well in the image filters.
>
> Perhaps they're tweaking the jpeg parameters to allow something smaller.
>
> Alternatively a smart scanner tool could actually do OCR, but I 
> suspect they don't unless you ask for it.
>
> Have you tried extracting the images from the omnipage output to see 
> how they compare with the inputs?