[reportlab-users] pdf's corrupted when emailed, possible solution included

Robin Becker reportlab-users@reportlab.com
Wed, 14 Apr 2004 17:56:23 +0100


Jeff Johnson wrote:

> Hi, we're using reportlab 1.18 and since we've switched to a postfix 
> mail server on linux, our PDFs that are sent from Outlook are 
> corrupted.  This is apparently because Outlook uses quoted-printable 
> encoding instead of base64 to encode the PDF when it doesn't see binary 
> data in the first few lines of the attachment.  According to the poster 
> in the link below, and the Adobe documentation, the solution to this 
> problem is to put a comment on the second line of the PDF with binary 
> characters in it so applications will know to treat the file as a binary 
> file.
> I've downloaded 1.19 and didn't see anything in the change notes 
> regarding this and was wondering if there's an easy way to do it or even 
> to get it into the next reportlab release as a standard feature?
> 
> Text of link included below:
> http://groups.google.com/groups?q=pdf+binary+comment+encoding+outlook&hl=en&lr=&ie=UTF-8&oe=UTF-8&selm=c1lkg9%241hlr%241%40FreeBSD.csie.NCTU.edu.tw&rnum=1 
> 
> 
> Regards,
> Jeff
> 
> I think I worked it out.
> The problem is with the *pdf*
> 
> Within the *PDF* its self - looking at the first few line
> (1) Not working
> %*PDF*-1.2
> 1 0 obj
> <<
> /Type /Catalog
> /Pages 4 0 R
> /Outlines 2 0 R
> 
> (2) Working *PDF*
> %*PDF*-1.2
> %=E2=E3=CF=D3
> 15 0 obj
> <<
> /Linearized 1
> 
> 
> Notice the 4 *binary* characters after the header, i.e. after %*PDF*-1.2
> I knew this was significant so I downloaded the *PDF* reference v 1.5 guide
> from Adobe. Under chapter 3.4.1
> 
> <snip>
> Note: If a *PDF* file contains *binary* data, as most do (see Section 3.1,
> =93Lexical Conventions=94),it is recommended that the header line be
> immediately followed by a *comment* line containing at least four *binary*
> characters=97that is, characters whose codes are 128 or greater. This wil=
> l
> ensure proper behavior of file transfer applications that inspect data
> near the beginning of a file to determine whether to treat the file=92s
> contents as text or as *binary*.
> </snip>
> 
> So this make sense from what I see, that *Outlook* considers the *pdf* 
> as tex=
> t
> not *binary* and so uses quoted-printable.
> 
> Drat I can't blame M$
> 
> Regards Darryl
> 
I'm quite surprised by your report as I see this in the normal output

%PDF-1.3
%“Œ‹ž ReportLab Generated PDF document http://www.reportlab.com

ie the second line is a comment containing some 'binary' characters.

I remember asking Andy Robinson about this in 2000 so it was certainly 
present at that early date. Can you post a minimal script that produces 
a PDF without the above.

Is it possible that reportlab/pdfbase/pdfdoc.py has something different 
at lines near 752 other than

### chapter 5
# Following Ken Lunde's advice and the PDF spec, this includes
# some high-order bytes.  I chose the characters for Tokyo
# in Shift-JIS encoding, as these cannot be mistaken for
# any other encoding, and we'll be able to tell if something
# has run our PDF files through a dodgy Unicode conversion.
PDFHeader = (
"%PDF-1.3"+LINEEND+
"%\223\214\213\236 ReportLab Generated PDF document 
http://www.reportlab.com"+LINEEND)

-- 
Robin Becker