[reportlab-users] pdf's corrupted when emailed, possible solution included

Jeff Johnson reportlab-users@reportlab.com
Wed, 14 Apr 2004 13:44:08 -0400


Hi Robin,

It looks like you're right.  My editor was showing those high order 
characters as spaces and I didn't notice.  I should have a test account 
on a windows box later today so I can see why Outlook is still using 
quoted-printable on these files.  Sorry for the confusion.  I'll report 
what I find :)

Regards,
Jeff

Robin Becker wrote:

> Jeff Johnson wrote:
>
>> Hi, we're using reportlab 1.18 and since we've switched to a postfix 
>> mail server on linux, our PDFs that are sent from Outlook are 
>> corrupted.  This is apparently because Outlook uses quoted-printable 
>> encoding instead of base64 to encode the PDF when it doesn't see 
>> binary data in the first few lines of the attachment.  According to 
>> the poster in the link below, and the Adobe documentation, the 
>> solution to this problem is to put a comment on the second line of 
>> the PDF with binary characters in it so applications will know to 
>> treat the file as a binary file.
>> I've downloaded 1.19 and didn't see anything in the change notes 
>> regarding this and was wondering if there's an easy way to do it or 
>> even to get it into the next reportlab release as a standard feature?
>>
>> Text of link included below:
>> http://groups.google.com/groups?q=pdf+binary+comment+encoding+outlook&hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=c1lkg9%241hlr%241%40FreeBSD.csie.NCTU.edu.tw&rnum=1 
>>
>>
>> Regards,
>> Jeff
>>
>> I think I worked it out.
>> The problem is with the *pdf*
>>
>> Within the *PDF* its self - looking at the first few line
>> (1) Not working
>> %*PDF*-1.2
>> 1 0 obj
>> <<
>> /Type /Catalog
>> /Pages 4 0 R
>> /Outlines 2 0 R
>>
>> (2) Working *PDF*
>> %*PDF*-1.2
>> %=E2=E3=CF=D3
>> 15 0 obj
>> <<
>> /Linearized 1
>>
>>
>> Notice the 4 *binary* characters after the header, i.e. after %*PDF*-1.2
>> I knew this was significant so I downloaded the *PDF* reference v 1.5 
>> guide
>> from Adobe. Under chapter 3.4.1
>>
>> <snip>
>> Note: If a *PDF* file contains *binary* data, as most do (see Section 
>> 3.1,
>> =93Lexical Conventions=94),it is recommended that the header line be
>> immediately followed by a *comment* line containing at least four 
>> *binary*
>> characters=97that is, characters whose codes are 128 or greater. This 
>> wil=
>> l
>> ensure proper behavior of file transfer applications that inspect data
>> near the beginning of a file to determine whether to treat the file=92s
>> contents as text or as *binary*.
>> </snip>
>>
>> So this make sense from what I see, that *Outlook* considers the 
>> *pdf* as tex=
>> t
>> not *binary* and so uses quoted-printable.
>>
>> Drat I can't blame M$
>>
>> Regards Darryl
>>
> I'm quite surprised by your report as I see this in the normal output
>
> %PDF-1.3
> % ReportLab Generated PDF document http://www.reportlab.com
>
> ie the second line is a comment containing some 'binary' characters.
>
> I remember asking Andy Robinson about this in 2000 so it was certainly 
> present at that early date. Can you post a minimal script that 
> produces a PDF without the above.
>
> Is it possible that reportlab/pdfbase/pdfdoc.py has something 
> different at lines near 752 other than
>
> ### chapter 5
> # Following Ken Lunde's advice and the PDF spec, this includes
> # some high-order bytes.  I chose the characters for Tokyo
> # in Shift-JIS encoding, as these cannot be mistaken for
> # any other encoding, and we'll be able to tell if something
> # has run our PDF files through a dodgy Unicode conversion.
> PDFHeader = (
> "%PDF-1.3"+LINEEND+
> "%\223\214\213\236 ReportLab Generated PDF document 
> http://www.reportlab.com"+LINEEND)
>