[reportlab-users] pdf's corrupted when emailed, possible solution
Wed, 14 Apr 2004 17:56:23 +0100
Jeff Johnson wrote:
> Hi, we're using reportlab 1.18 and since we've switched to a postfix
> mail server on linux, our PDFs that are sent from Outlook are
> corrupted. This is apparently because Outlook uses quoted-printable
> encoding instead of base64 to encode the PDF when it doesn't see binary
> data in the first few lines of the attachment. According to the poster
> in the link below, and the Adobe documentation, the solution to this
> problem is to put a comment on the second line of the PDF with binary
> characters in it so applications will know to treat the file as a binary
> I've downloaded 1.19 and didn't see anything in the change notes
> regarding this and was wondering if there's an easy way to do it or even
> to get it into the next reportlab release as a standard feature?
> Text of link included below:
> I think I worked it out.
> The problem is with the *pdf*
> Within the *PDF* its self - looking at the first few line
> (1) Not working
> 1 0 obj
> /Type /Catalog
> /Pages 4 0 R
> /Outlines 2 0 R
> (2) Working *PDF*
> 15 0 obj
> /Linearized 1
> Notice the 4 *binary* characters after the header, i.e. after %*PDF*-1.2
> I knew this was significant so I downloaded the *PDF* reference v 1.5 guide
> from Adobe. Under chapter 3.4.1
> Note: If a *PDF* file contains *binary* data, as most do (see Section 3.1,
> =93Lexical Conventions=94),it is recommended that the header line be
> immediately followed by a *comment* line containing at least four *binary*
> characters=97that is, characters whose codes are 128 or greater. This wil=
> ensure proper behavior of file transfer applications that inspect data
> near the beginning of a file to determine whether to treat the file=92s
> contents as text or as *binary*.
> So this make sense from what I see, that *Outlook* considers the *pdf*
> as tex=
> not *binary* and so uses quoted-printable.
> Drat I can't blame M$
> Regards Darryl
I'm quite surprised by your report as I see this in the normal output
%“Œ‹ž ReportLab Generated PDF document http://www.reportlab.com
ie the second line is a comment containing some 'binary' characters.
I remember asking Andy Robinson about this in 2000 so it was certainly
present at that early date. Can you post a minimal script that produces
a PDF without the above.
Is it possible that reportlab/pdfbase/pdfdoc.py has something different
at lines near 752 other than
### chapter 5
# Following Ken Lunde's advice and the PDF spec, this includes
# some high-order bytes. I chose the characters for Tokyo
# in Shift-JIS encoding, as these cannot be mistaken for
# any other encoding, and we'll be able to tell if something
# has run our PDF files through a dodgy Unicode conversion.
PDFHeader = (
"%\223\214\213\236 ReportLab Generated PDF document