[reportlab-users] Optimizing greyscale and bilevel images

Robin Becker robin at reportlab.com
Tue Jan 3 11:24:33 EST 2012


We have begun changing the PDF requirement when certain features that require
1.4 are recognized (eg transparency) eg

canvas.py: self._doc.ensureMinPdfVersion('transparency')

If this particular format needs 1.4 then we can do the same.

However, we need to be able to recognize these images switch to using the
appropriate filter late on. Currently we are able to recognize jpegs fairly
early on so as not to convert back and forth in PIL. In the JBIG2 case we'll
need to have some means of failing gently when PIL falls over or of recognizing
the image before it gets to PIL. At present the ImageReader class tries to read
into PIL even for jpegs. Probably we'll need to do some re-writing to avoid that
if possible and check for JBIG2 early.
--
Robin Becker



n 03/01/2012 15:45, Jesus Cea wrote:

> -----BEGIN PGP SIGNED MESSAGE-----

> Hash: SHA1

>

> On 31/12/11 20:18, Glenn Linderman wrote:

>> In my work with bilevel images, I have found that TIFF with Group

>> 4 compression seems to produce the smallest files with lossless

>> compression, or DJVU even smaller, but I think just slightly lossy.

>> I think the bilevel compression used by PDF is also Group 4?

>

> PDF 1.4 and higher support JBIG2 natively.

>

>> Am I correct that PNG doesn't support Group 4 compression, or am

>> I missing an option that would allow bilevel PNG files to be as

>> small as TIFF/Group 4 ?

>

> The problem is this:

>

> Current ReportLab takes the picture and if it is a JPEG file, includes

> it as is in the PDF (if it is a greyscale JPEG, it exploit it). But if

> the image source is a PIL object, it ALWAYS convert it to RGB, export

> the raw pixel data and simply uses a ZLIB (deflate) compression on the

> pixel data.

>

> So using a PIL "true color" image with reportlab will produce huge

> files. It is better to save the image as JPEG and import it in

> reportlab as JPEG file, not as a PIL object.

>

> But because of this internal conversion, non "true color" images, like

> greyscale, bilevel or indexed images will waste a lot. If your image

> is a pie char with ten colors, it will be inserted in the PDF as a RGB

> blob (24 bits per pixel) with the only "improvement" of a ZLIB

> wrapping. This is wasteful and slow.

>

> Here we have a few choices:

>

> 1. If I remember correctly, when current code is given a image

> filename, it includes the JPEG directly if it is a JPEG. But if the

> file has ANY other format (PNG, for instance), it will be imported in

> PIL, converted to RGB and inserted as a RGB blob+ZLIB. Beside size

> expansion, you are limited to PIL recognized formats.

>

> Would be nice, for instance, being able to insert jbig2 files or

> TIFF/Group 4 files. Recent versions of PDF standard support them

> natively, so reportlab doesn't need to decode the image, it can insert

> the file in the PDF with minimal header parsing, if any (like it

> already does with jpeg files). This is nice because, for instance, we

> don't have to worry about patents.

>

> Reportlab is generating PDF 1.3 files. I don't know what could go

> wrong if we increase version to 1.4, that allows native jbig2.

>

> 2. When inserting a PIL image, current reportlab converts it to RGB

> always. I think the lib should support natively bilevel, greyscale and

> indexed PIL files. Seems easy enough.

>

> I have written a small and trivial path to support bilevel (with a

> width multiple of 8) and greyscale PIL images. I have patched

> "drawInlineImage" because it was way easier that "drawImage" and

> enough for my inmediate needs. I don't understand PDF standard

> description enough to implement indexed images.

>

> My patch reduces my generated PDF files to half size, so I am happy,

> but I can't invest any more time on this. Holidays are over :).

>

> My patch:

>

> """

> jcea at ubuntu:/usr/lib/python2.6/dist-packages/reportlab/pdfgen$ diff -u

> pdfimages.py.OLD pdfimages.py

> - --- pdfimages.py.OLD 2009-02-03 22:26:43.000000000 +0100

> +++ pdfimages.py 2011-12-29 00:54:36.923812086 +0100

> @@ -100,21 +100,30 @@

> if image.mode == 'CMYK':

> myimage = image

> colorSpace = 'DeviceCMYK'

> - - bpp = 4

> + bpp = 4*8

> + elif image.mode == '1' :

> + myimage = image

> + colorSpace = 'DeviceGray'

> + bpp = 1

> + elif image.mode == 'L' :

> + myimage = image

> + colorSpace = 'DeviceGray'

> + bpp = 1*8

> else:

> myimage = image.convert('RGB')

> colorSpace = 'RGB'

> - - bpp = 3

> + bpp = 3*8

> imgwidth, imgheight = myimage.size

>

> # this describes what is in the image itself

> # *NB* according to the spec you can only use the short form

> in inline images

> #imagedata=['BI /Width %d /Height /BitsPerComponent 8

> /ColorSpace /%s /Filter [/Filter [ /ASCII85Decode /FlateDecode] ID]' %

> (imgwidth, imgheight,'RGB')]

> - - imagedata=['BI /W %d /H %d /BPC 8 /CS /%s /F [/A85 /Fl] ID' %

> (imgwidth, imgheight,colorSpace)]

> + imagedata=['BI /W %d /H %d /BPC %d /CS /%s /F [/A85 /Fl] ID' %

> + (imgwidth, imgheight,1 if bpp<8 else 8,colorSpace)]

>

> #use a flate filter and Ascii Base 85 to compress

> raw = myimage.tostring()

> - - assert len(raw) == imgwidth*imgheight*bpp, "Wrong amount of

> data for image"

> + assert len(raw) == imgwidth*imgheight*bpp/8.0, "Wrong amount

> of data for image"

> compressed = zlib.compress(raw) #this bit is very fast...

> encoded = pdfutils._AsciiBase85Encode(compressed) #...sadly

> this may not be

> #append in blocks of 60 characters

> """

>

> The bilevel images MUST have a width multiple of 8. I guess this

> condition can be lifted, but I needed something "yesterday".

>

> Have a good year!.

>

> - --

> Jesus Cea Avion _/_/ _/_/_/ _/_/_/

> jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/

> jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/

> . _/_/ _/_/ _/_/ _/_/ _/_/

> "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/

> "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/

> "El amor es poner tu felicidad en la felicidad de otro" - Leibniz

> -----BEGIN PGP SIGNATURE-----

> Version: GnuPG v1.4.10 (GNU/Linux)

> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

>

> iQCVAwUBTwMipJlgi5GaxT1NAQKk4wQAj+dPM5mLxOxKVKENJ7VVoJg0e8HFx1Gt

> 6lDxm6WPMkeOxuv6R34wwirfwgjezspp04WeNTDtbEzfxA1mCXGm//5ckItk92St

> yAecoZlcbDlN+PztuL011/j0Hn6GSkDYwnkhdVwAnPln4cfic23D4zeAYA3UZJ7d

> sMgQZzX1PyE=

> =yAYn

> -----END PGP SIGNATURE-----

> _______________________________________________

> reportlab-users mailing list

> reportlab-users at lists2.reportlab.com

> http://two.pairlist.net/mailman/listinfo/reportlab-users

>




More information about the reportlab-users mailing list