[reportlab-users] Writing smaller image-only PDFs

Nathan nathan.stocks at gmail.com
Thu Feb 9 14:00:55 EST 2006


On 2/9/06, Chris Jerdonek <jerdonek at gmail.com> wrote:
> > Date: Thu, 09 Feb 2006 17:27:50 +1100
> > From: Nicholas Watmough <nickw at deakin.edu.au>
> > Subject: Re: [reportlab-users] Writing smaller image-only PDFs
> > To: Support list for users of Reportlab software
> >       <reportlab-users at reportlab.com>
> > Message-ID: <43EAE0E6.1010804 at deakin.edu.au>
> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> >
> > Each JPEG is about 0.5MB, so the combined size would be about 5.5MB.
> >
> > However, I tried saving the JPEGs individually through the commercial
> > tool (Omnipage), and the JPEGs were the same size as when saved through
> > Python. But the imge-only PDF produced by Omnipage was 0.4MB, and the
> > one produced through reportlab was 7.8MB.
>
> How big is the Omnipage file if you first save them as JPEGS and then
> create the Omnipage PDF from that?  I bet it will also be big.  It
> sounds like all the compression is happening in the scanning process.
>
> --Chris

You guys are driving me nuts, beating around the issues.  Class is now
in session:

JPEGs store full color information - always.  The lossiness vs.
quality of a JPEG file is configurable, but the resulting file is
about as compressed as the file can get while preserving the chosen
amount of full-color information.  You can't take something that's
already been compressed, and compress it much further with a lossless
format [compressed formats look like random bits, and the more random
a bit stream is, the less you can compress it without losing info]. 
The JPEG format is optimized for full-color photographs, and store the
full-color information in a highly compressed format.  It's really,
really bad [file-size-wise] for images of mostly one color.

Bitmaps (BMP's) are a simple, lossless format.  You can set bitmaps to
accept only grayscale or only black & white, which will affect file
size a bit.  Bitmap files aren't usually compressed at all!  The
format is basically....
"WHITEPIXEL WHITEPIXEL WHITEPIXEL WHITEPIXEL BLACKPIXEL WHITEPIXEL
WHITEPIXEL WHITEPIXEL BLUEPIXEL etc."
External compression programs can compress bitmaps significantly. Even
compressed, the bitmaps are much larger than some other image formats
since they preserve all information for every pixel separately.

PDFs (I know less about PDF internals than image format internals), as
far as I understand, just embed whatever you give it, be it text,
image files, or vectors--although it supports a basic lossless
compression algorithm that it can use internally.  As far as I know,
any specific "image compression", especially lossy image compression,
will have to be done before you give the image to the PDF-generator. 
JPEGs won't compress hardly at all with a lossless algorithm, because
they are already compressed.  BMPs will benefit hugely from lossless
compression, because they aren't hardly compressed at all to begin
with.

In the case originally mentioned in this thread, it's extremely likely
that omnipage is _not_ just a pdf generator.  First, omnipage is
converting the image to a different image format (2-color GIF, for
example), which throws out the color information, and is highly
compressed.  The small images from the image conversion subcomponent
are then given to whatever subcomponent generates the PDFs, and voila!
 Small PDF.

Convert your own images to something nice and small, and reportlab
ought to generate a small PDF for you.  End of story.

Yes, this is a high-level view,  Yes, I could sure be wrong on some
detail specifics (feel free to correct me if you _know_).  No, I'm not
trying to offend anybody.  Your mileage may vary.  Void in the
following states: denial, insanity, police.

~ Nathan


More information about the reportlab-users mailing list