[reportlab-users] Optimizing greyscale and bilevel images

Jesus Cea jcea at jcea.es
Tue Jan 3 10:45:40 EST 2012

Hash: SHA1

On 31/12/11 20:18, Glenn Linderman wrote:

> In my work with bilevel images, I have found that TIFF with Group

> 4 compression seems to produce the smallest files with lossless

> compression, or DJVU even smaller, but I think just slightly lossy.

> I think the bilevel compression used by PDF is also Group 4?

PDF 1.4 and higher support JBIG2 natively.

> Am I correct that PNG doesn't support Group 4 compression, or am

> I missing an option that would allow bilevel PNG files to be as

> small as TIFF/Group 4 ?

The problem is this:

Current ReportLab takes the picture and if it is a JPEG file, includes
it as is in the PDF (if it is a greyscale JPEG, it exploit it). But if
the image source is a PIL object, it ALWAYS convert it to RGB, export
the raw pixel data and simply uses a ZLIB (deflate) compression on the
pixel data.

So using a PIL "true color" image with reportlab will produce huge
files. It is better to save the image as JPEG and import it in
reportlab as JPEG file, not as a PIL object.

But because of this internal conversion, non "true color" images, like
greyscale, bilevel or indexed images will waste a lot. If your image
is a pie char with ten colors, it will be inserted in the PDF as a RGB
blob (24 bits per pixel) with the only "improvement" of a ZLIB
wrapping. This is wasteful and slow.

Here we have a few choices:

1. If I remember correctly, when current code is given a image
filename, it includes the JPEG directly if it is a JPEG. But if the
file has ANY other format (PNG, for instance), it will be imported in
PIL, converted to RGB and inserted as a RGB blob+ZLIB. Beside size
expansion, you are limited to PIL recognized formats.

Would be nice, for instance, being able to insert jbig2 files or
TIFF/Group 4 files. Recent versions of PDF standard support them
natively, so reportlab doesn't need to decode the image, it can insert
the file in the PDF with minimal header parsing, if any (like it
already does with jpeg files). This is nice because, for instance, we
don't have to worry about patents.

Reportlab is generating PDF 1.3 files. I don't know what could go
wrong if we increase version to 1.4, that allows native jbig2.

2. When inserting a PIL image, current reportlab converts it to RGB
always. I think the lib should support natively bilevel, greyscale and
indexed PIL files. Seems easy enough.

I have written a small and trivial path to support bilevel (with a
width multiple of 8) and greyscale PIL images. I have patched
"drawInlineImage" because it was way easier that "drawImage" and
enough for my inmediate needs. I don't understand PDF standard
description enough to implement indexed images.

My patch reduces my generated PDF files to half size, so I am happy,
but I can't invest any more time on this. Holidays are over :).

My patch:

jcea at ubuntu:/usr/lib/python2.6/dist-packages/reportlab/pdfgen$ diff -u
pdfimages.py.OLD pdfimages.py
- --- pdfimages.py.OLD 2009-02-03 22:26:43.000000000 +0100
+++ pdfimages.py 2011-12-29 00:54:36.923812086 +0100
@@ -100,21 +100,30 @@
if image.mode == 'CMYK':
myimage = image
colorSpace = 'DeviceCMYK'
- - bpp = 4
+ bpp = 4*8
+ elif image.mode == '1' :
+ myimage = image
+ colorSpace = 'DeviceGray'
+ bpp = 1
+ elif image.mode == 'L' :
+ myimage = image
+ colorSpace = 'DeviceGray'
+ bpp = 1*8
myimage = image.convert('RGB')
colorSpace = 'RGB'
- - bpp = 3
+ bpp = 3*8
imgwidth, imgheight = myimage.size

# this describes what is in the image itself
# *NB* according to the spec you can only use the short form
in inline images
#imagedata=['BI /Width %d /Height /BitsPerComponent 8
/ColorSpace /%s /Filter [/Filter [ /ASCII85Decode /FlateDecode] ID]' %
(imgwidth, imgheight,'RGB')]
- - imagedata=['BI /W %d /H %d /BPC 8 /CS /%s /F [/A85 /Fl] ID' %
(imgwidth, imgheight,colorSpace)]
+ imagedata=['BI /W %d /H %d /BPC %d /CS /%s /F [/A85 /Fl] ID' %
+ (imgwidth, imgheight,1 if bpp<8 else 8,colorSpace)]

#use a flate filter and Ascii Base 85 to compress
raw = myimage.tostring()
- - assert len(raw) == imgwidth*imgheight*bpp, "Wrong amount of
data for image"
+ assert len(raw) == imgwidth*imgheight*bpp/8.0, "Wrong amount
of data for image"
compressed = zlib.compress(raw) #this bit is very fast...
encoded = pdfutils._AsciiBase85Encode(compressed) #...sadly
this may not be
#append in blocks of 60 characters

The bilevel images MUST have a width multiple of 8. I guess this
condition can be lifted, but I needed something "yesterday".

Have a good year!.

- --
Jesus Cea Avion _/_/ _/_/_/ _/_/_/
jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/
jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/
. _/_/ _/_/ _/_/ _/_/ _/_/
"Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/
"My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/


More information about the reportlab-users mailing list