[reportlab-users] pdf's corrupted when emailed, possible solution
Wed, 14 Apr 2004 13:44:08 -0400
It looks like you're right. My editor was showing those high order
characters as spaces and I didn't notice. I should have a test account
on a windows box later today so I can see why Outlook is still using
quoted-printable on these files. Sorry for the confusion. I'll report
what I find :)
Robin Becker wrote:
> Jeff Johnson wrote:
>> Hi, we're using reportlab 1.18 and since we've switched to a postfix
>> mail server on linux, our PDFs that are sent from Outlook are
>> corrupted. This is apparently because Outlook uses quoted-printable
>> encoding instead of base64 to encode the PDF when it doesn't see
>> binary data in the first few lines of the attachment. According to
>> the poster in the link below, and the Adobe documentation, the
>> solution to this problem is to put a comment on the second line of
>> the PDF with binary characters in it so applications will know to
>> treat the file as a binary file.
>> I've downloaded 1.19 and didn't see anything in the change notes
>> regarding this and was wondering if there's an easy way to do it or
>> even to get it into the next reportlab release as a standard feature?
>> Text of link included below:
>> I think I worked it out.
>> The problem is with the *pdf*
>> Within the *PDF* its self - looking at the first few line
>> (1) Not working
>> 1 0 obj
>> /Type /Catalog
>> /Pages 4 0 R
>> /Outlines 2 0 R
>> (2) Working *PDF*
>> 15 0 obj
>> /Linearized 1
>> Notice the 4 *binary* characters after the header, i.e. after %*PDF*-1.2
>> I knew this was significant so I downloaded the *PDF* reference v 1.5
>> from Adobe. Under chapter 3.4.1
>> Note: If a *PDF* file contains *binary* data, as most do (see Section
>> =93Lexical Conventions=94),it is recommended that the header line be
>> immediately followed by a *comment* line containing at least four
>> characters=97that is, characters whose codes are 128 or greater. This
>> ensure proper behavior of file transfer applications that inspect data
>> near the beginning of a file to determine whether to treat the file=92s
>> contents as text or as *binary*.
>> So this make sense from what I see, that *Outlook* considers the
>> *pdf* as tex=
>> not *binary* and so uses quoted-printable.
>> Drat I can't blame M$
>> Regards Darryl
> I'm quite surprised by your report as I see this in the normal output
> %“Œ‹ž ReportLab Generated PDF document http://www.reportlab.com
> ie the second line is a comment containing some 'binary' characters.
> I remember asking Andy Robinson about this in 2000 so it was certainly
> present at that early date. Can you post a minimal script that
> produces a PDF without the above.
> Is it possible that reportlab/pdfbase/pdfdoc.py has something
> different at lines near 752 other than
> ### chapter 5
> # Following Ken Lunde's advice and the PDF spec, this includes
> # some high-order bytes. I chose the characters for Tokyo
> # in Shift-JIS encoding, as these cannot be mistaken for
> # any other encoding, and we'll be able to tell if something
> # has run our PDF files through a dodgy Unicode conversion.
> PDFHeader = (
> "%\223\214\213\236 ReportLab Generated PDF document