[reportlab-users] Platypus & unicode

Andrew Smart smart at smart-knowhow.de
Tue Nov 13 19:18:19 EST 2007


Hi folks,

probably a common question but I haven't found a clear answer on the various
resources.

Problably a misunderstanding from my side.

I use Platypus, and I'm reading text files from the disk. Those textfiles
are coded with cp1252.

I load the text files as lines into memory, and I'm ensuring that every
single string is converted correctly to unicode - using the
text = unicode(text, "cp1252")
statement.

When I feed these strings into the Platypus framework I get unicode/decode
errors on various occasions. When I check the sources I find out that the
various .split() and .join() statements create "str" strings out of my
unicode strings. Those strings are then recoded into unicode using the
"utf-8" encoding, and here the conversion breaks.

Obviously, since my encoding is based on cp1252.

Arg.

What I understand: through unicode-str-unicode conversions inside Platypus
it is not the best idea to start with "cp1252".

Right?

Here my misunderstanding starts... Since I thoughted that the internal
unicode representation is independent from the encoding which is used to
store the strings in byte sequences or, e.g., in files. So splitting a
cp1252-encoded string "internally" inside Python routings should create
"str" which should be join'ed() and en/decoded back into unicode without
hassle...

But never mind.

Simple question: do I have to use utf-8 "coded" strings as input for
Platypus?

Any pointers are greatly appreciated :-))

Andrew




More information about the reportlab-users mailing list