[reportlab-users] IronPython and ReportLab
Tim Roberts
timr at probo.com
Thu Aug 20 13:55:25 EDT 2009
Marco Parenzan wrote:
>
>
>
> I’m trying to use IronPython 2.0.2 with ReportLab PDF. I have
> downloaded it (version 2.3) with FePy for unicodedata.py.
>
> I used this code for testing (from Magnus Lie Hetland “Beginning Python”):
>
> ...
>
> I have discovered that the problem is in this string, that is the
> header for the PDF file:
>
>
>
> # Following Ken Lunde's advice and the PDF spec, this includes
>
> # some high-order bytes. I chose the characters for Tokyo
>
> # in Shift-JIS encoding, as these cannot be mistaken for
>
> # any other encoding, and we'll be able to tell if something
>
> # has run our PDF files through a dodgy Unicode conversion.
>
> PDFHeader = (
>
> "%PDF-1.3"+LINEEND+
>
> "%\223\214\213\236 ReportLab Generated PDF document
> http://www.reportlab.com"+LINEEND <http://www.reportlab.com%22+LINEEND>)
>
>
>
> Which is appended at the beginning of str variable. The problem is in
> the four characters: \223\214\213\236, which are badly converted into
> Unicode. Accordingly to the comment (and to PDF documentation) the
> four bytes are not fixed, but can be any number, better if >128
> because of automatic detection as binary, not text. If I convert them
> into full-code Unicode characters \x00DF\x00D6\x00D5\x00EC, all is ok.
>
>
>
> "%\x00DF\x00D6\x00D5\x00EC ReportLab Generated PDF document
> http://www.reportlab.com"+LINEEND)”
>
Escape codes without an "x" are actually in octal, not decimal. Hence,
the \223 in that first byte is \x93, not \xDF, which is why the error
refers to \x93. However, you can't really use \x0093, because U+0093 is
a control character that won't map to a printable character in any 8-bit
encoding.
This is a tricky problem. The desire is to use characters that will map
to bytes greater than 127 when IronPython converts it to an 8-bit
encoding for writing out to file. The original 4 bytes in Shift-JIS
actually map to two Unicode code points, U+6771 and U+4EAC (I think).
Thus, a faithful translation would actually read:
"%\x6771\x4EAC ReportLab Generated..."
But that's not practical, because you'd have to have a Kanji code page
in place before IronPython could write this as an 8-bit file. (Or UTF-8.)
The 4 characters you accidentally chose are a reasonable compromise;
those are Latin-1 characters (the German "ss", O with dots, O with
tilde, small i with grave accent). They will work with most of the
Latin and European code pages, but they won't work in Far East code pages.
--
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.
More information about the reportlab-users
mailing list