[reportlab-team] [reportlab-users] reportlab and CMYK images, take 2

Mon, 06 Oct 2003 00:16:36 +0200

Andy Robinson wrote:
>>Ok, I've done some research and testing, and I think I'd like to do the
>>following:
>>
>>- rely on PIL as much as possible. That means for instance not to use
>>pdfutils.readJPEGInfo().
> 
> 
> Here are two problems.  First, there are many times when a C compiler
> is not available at a customer site. We at least have the ability to
> handle JPEGs natively at the moment.
> Second, we're adding Java support.  The current CVS code can
> import images in Jython, wrapping up java.awt.image instead
> of PIL.  Whatever we add, we ought to make sure we add
> in the Java wrapper too.
 >
 > In other words, PIL should be completely encapsulated.

Ok, then maybe the PIL_image wrapper should incorporate that code. 
Actually, the functionality I need from PIL when reading from files is 
python-only also in PIL. If there are no license obstacles, this could 
just be copied.

>>[snip] 
> 
>>In that context, what is PDFImage.format() about?
> 
> 
> This writes out the image object in the format PDF requires.
> Not just the stream of compressed pixels, but also the dictionary
> beforehand describing the color space, bits per pixel,
> width, height etc.  All our objects have a format() method
> which writes out a stream of stuff to go in the PDF file.
> 
> (A PDFDocument object has to be passed as an argument so
> that objects involving cross-references can be resolved,
> but it makes no difference to images).

But this isn't used when writing on a canvas, right? I'm somewhat 
confused because of the duplicated functionality/information.
On one hand, the pdf dictionary is hardcoded into (jpg|PIL)_imagedata, 
OTOH there's what you describe above. Is this the result of "work in 
progress"?

>>- In addition to the specialized "converter" methods, offer a generic
>>converter method, which uses PIL to convert the image to "raw" format,
>>but preserves the seperation (i.e. never convert from CMYK->RGB or RGB
>>->CMYK). This conflicts with what you wrote above, but I don't think
>>it's wise to offer that conversion.
>>RGB->CMYK is  dependend on the output medium, and Acrobat Reader is
>>capable of displaying CMYK images, so I don't see the need for conversion.
> 
> 
> I understand that you want to read in CMYK images and display
> them as such, which makes a lot of sense.  But I think it's
> also strange to let people mix RGB and CMYK models in
> one document.  If we moved to proper support for professional
> printing, I think it might be better to 'declare a color palette'
> somehow.  So if you say you are doing a CMYK document,
> all images and colors get converted to CMYK, or you are only
> permitted to use those colors.

I think it's a bad idea to get into this, you'd be factually taking 
responsibility to do the conversion right (which is impossible), instead 
of letting the output software/driver do it. Everything I've read 
advices against this.  If someone knows exactly what color profile to 
use, he can do it himself with something like pycms (python bindings to 
littleCMS).
One example for mixing different color spaces, say you want want to 
produce a pdf document which mixes a company logo and photos. The logo 
uses is in CMYK and has some areas with (1,0,0,0) CMYK color. The photos 
are RGB.
Now you could either convert the logo to RGB, which will probably fsck 
up the monotone color areas and lead to dithering on the printer, or you 
can convert the photos to CMYK, for which you need to know the color 
profile of the output medium (and which might be hidden somewhere in the 
printer).
The best in the above situation is to insert every imaga as-is, and let 
the displaying/printing application do the appropriate conversions.

Additionally, the pdf specification allows for "alternate images", and 
talks specifially about including multiple versions of an image 
differing only by color space.

> 
>>- Mid term goal: Unify image XObjects and inline images.
> 
> 
> What do you mean?  One method, with an argument to say if
> it goes inline or externally?  That would make sense although
> a lot of the code to produce them is shared already.

Uh, as I see it, the complete functionality of inline images and 
PDFImageXObject objects is duplicated. Am I missing something?
I must confess, though, that I mainly looked at pdfimages.py until now,
I don't quite grok the rest of the structure.

Anyway, I thought one of these (PDFImageXObject, PDFImage) should just 
inherit the other.

But on a second thought, maybe this whole image stuff should be solved 
by writing a factory class, and the whole ugly image converting and 
caching business should be done in the factory, which would return a 
nice clean PDFImage instance.

Btw., what is the oldest python version reportlab is targeted at, last I 
read python 1.5.2 compability isn't required?

cheers,
oliver