[reportlab-users] Patch: PIL images with ReportLab

Thu Apr 28 05:53:09 EDT 2005

Sam Hunter wrote:
> Yeah, I think it should be okay.. but I"m not a Java kid by any means :)
> 
> What I'd really like to do is fix the canvas.drawImage() function so 
> that it would just accept PIL images.
> 
>  It looks like parts of canvas were originally designed to handle PILs, 
> but that functionality was broken when the drawImage() function was 
> written because it does MD5 sums, and needs the rawdata to do the MD5 
> sum with.
>  I fixed that problem (canvas.py lines 564-575), but that has exposed 
> another issue.
> 
>  The canvas,drawImage() function calls 
> PDFImageXObject.loadImageFromSRC() function which seems to be built for 
> loading PIL Image object, but it is a little messed up because it tries 
> to access their format information with
> "im._image.format"
>  Normal PIL Image objects don't have a "_image" member, and that 
> ".format" is only valid if you haven't changed the PIL image in any way 
> (a resize sets it to Null).
>  I am currently working on a way to have the PIL image convert itself to 
> something so that the line "self.loadImageFromJPEG(fp)" (pdfdoc.py line 
> 1827ish) doesn't even need to be run.  As far as I can tell, all of the 
> self.* members that loadImageFromJPEG() sets are available from various 
> PIL information functions, and I would imagine that the 
> self.streamContent could probably be generated with some kind of PIL 
> function as well.  If not, the data in imageFile could.
> 
> Does anyone know if these functions are used for things other than PIL 
> image types?  Has this path through the code been broken at the 
> beginning for so long that it is just way out of date, or was it broken 
> to make something else work?
> 
> thanks :)
> 
> Sam
> 
.....

The ImageReader object is the only nn filename image we're supposed to be 
passing into the back end.  We should not be passing raw images around and 
handling them with special case code everywhere. That's why we have the _image 
thing inside it is supposed to be private. The reason for the separate load from 
functions is that we have potentially different sources.

First off JPeg is native for PDF and is used even when PIL is not available. 
Thus we have a fake attempt to split files on the extension and use 
loadImageFromJPEG; converting a JPEG to RGB or CMYK is non-trivial.

For historical reasons we still attempt to support prebuilt .a85 files which 
have been built into a PDF stream format in some way and are available when PIL 
isn't that's via loadImageFromA85.

Otherwise the path is via loadImageFromSRC (which should really be called 
loadImageFromImageReader). If any encapsulation is to be done I prefer that it 
be restricted to the ImageReader class.

I strongly oppose removing the special JPEG handling. PIL doesn't exist 
everywhere and we would be foolish to rely on it. The same is true of attempts 
to get PIL to do the formatting into PDF.

Allowing the ImageReader class to accept some specific image object instances is 
fine.

So far as I know we are using

im.getSize
im.getRGBData
im.mode

im.fp
im.format

I'm pretty sure the latter two are only used in a hackish attempt to use a PIL 
opened JPEG in native form. The reason for that is as follows; if we don't use 
the native version we have to do a conversion to either RGB or CMYK etc. Of 
course if what you really want to do is read a PIL image do conversions and then 
use the result via RGB conversion then this hack is wrong.

The real pain is that we currently have similar approaches in two places ie 
pdfdoc.py/pdfutils and also pdfimages.
-- 
Robin Becker