[reportlab-users] Reducing use of 7-bit characters

Andy Robinson andy at reportlab.com
Tue May 5 16:54:25 EDT 2009


2009/5/5 Yoann Roman <yroman-reportlab at altalang.com>:

> I made a first pass at #2. It was pretty easy to implement and did the

> trick for Outlook. Before I spend any more time on this, is there a

> reason for the A85 encoding? If not, is there a patch already out there

> to disable this?


When I wrote the very first version (1998?), I used to run 'Hello world'
files through Distiller and see what came out. Ascii base 85
seemed to be in vogue then so I went for that, as it was more
compact than base 64.

The available filters are listed in section 7.4 on the PDF 1.7 spec
(the ISO one, and the most readable version by far) here..
http://www.adobe.com/devnet/pdf/pdf_reference.html

Since the 'stuff being encoded' is usually postscript-like
graphics instructions which are mostly 7-bit ASCII, I think that
just 'not encoding it' would not help, and would leave it
vulnerable to even more corruption during transmission.
However, just using FlateFilter (basically using gzip.compress)
without the Ascii85 would yield a nicely binary stream.


LF versus CRLF again came from looking at what Distiller
produced. But maybe a better approach is to get rid of 95%
the "line wrapping" altogether.
I went to significant lengths to make sure the raw PDF files were
"readable" in an editor, wrapping in sensible places, because we
spent a LOT of time from 1998-2003 staring at the innards of PDF
files. It would be very easy to "not line-wrap" much
of the content. Maybe Outlook is noticing the formatting more than
the coding when assuming this is text.

In summary, let's look at a 'binary' option which does both,
and see if that fools Outlook.

- Andy


More information about the reportlab-users mailing list