[reportlab-users] pdf's corrupted when emailed, possible solution
included
Robin Becker
reportlab-users@reportlab.com
Wed, 14 Apr 2004 17:56:23 +0100
Jeff Johnson wrote:
> Hi, we're using reportlab 1.18 and since we've switched to a postfix
> mail server on linux, our PDFs that are sent from Outlook are
> corrupted. This is apparently because Outlook uses quoted-printable
> encoding instead of base64 to encode the PDF when it doesn't see binary
> data in the first few lines of the attachment. According to the poster
> in the link below, and the Adobe documentation, the solution to this
> problem is to put a comment on the second line of the PDF with binary
> characters in it so applications will know to treat the file as a binary
> file.
> I've downloaded 1.19 and didn't see anything in the change notes
> regarding this and was wondering if there's an easy way to do it or even
> to get it into the next reportlab release as a standard feature?
>
> Text of link included below:
> http://groups.google.com/groups?q=pdf+binary+comment+encoding+outlook&hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=c1lkg9%241hlr%241%40FreeBSD.csie.NCTU.edu.tw&rnum=1
>
>
> Regards,
> Jeff
>
> I think I worked it out.
> The problem is with the *pdf*
>
> Within the *PDF* its self - looking at the first few line
> (1) Not working
> %*PDF*-1.2
> 1 0 obj
> <<
> /Type /Catalog
> /Pages 4 0 R
> /Outlines 2 0 R
>
> (2) Working *PDF*
> %*PDF*-1.2
> %=E2=E3=CF=D3
> 15 0 obj
> <<
> /Linearized 1
>
>
> Notice the 4 *binary* characters after the header, i.e. after %*PDF*-1.2
> I knew this was significant so I downloaded the *PDF* reference v 1.5 guide
> from Adobe. Under chapter 3.4.1
>
> <snip>
> Note: If a *PDF* file contains *binary* data, as most do (see Section 3.1,
> =93Lexical Conventions=94),it is recommended that the header line be
> immediately followed by a *comment* line containing at least four *binary*
> characters=97that is, characters whose codes are 128 or greater. This wil=
> l
> ensure proper behavior of file transfer applications that inspect data
> near the beginning of a file to determine whether to treat the file=92s
> contents as text or as *binary*.
> </snip>
>
> So this make sense from what I see, that *Outlook* considers the *pdf*
> as tex=
> t
> not *binary* and so uses quoted-printable.
>
> Drat I can't blame M$
>
> Regards Darryl
>
I'm quite surprised by your report as I see this in the normal output
%PDF-1.3
%“Œ‹ž ReportLab Generated PDF document http://www.reportlab.com
ie the second line is a comment containing some 'binary' characters.
I remember asking Andy Robinson about this in 2000 so it was certainly
present at that early date. Can you post a minimal script that produces
a PDF without the above.
Is it possible that reportlab/pdfbase/pdfdoc.py has something different
at lines near 752 other than
### chapter 5
# Following Ken Lunde's advice and the PDF spec, this includes
# some high-order bytes. I chose the characters for Tokyo
# in Shift-JIS encoding, as these cannot be mistaken for
# any other encoding, and we'll be able to tell if something
# has run our PDF files through a dodgy Unicode conversion.
PDFHeader = (
"%PDF-1.3"+LINEEND+
"%\223\214\213\236 ReportLab Generated PDF document
http://www.reportlab.com"+LINEEND)
--
Robin Becker