[reportlab-users] Re: quick hack on para.py
Robin Becker
robin at reportlab.com
Wed Jul 5 08:16:42 EDT 2006
Dirk Holtwick wrote:
> Robin Becker schrieb:
>> the important one (at least for most Europeans) grrhhhhhh. I looked this
>> up. According to
>> http://www.unicode.org/Public/MAPPINGS/VENDORS/ADOBE/symbol.txt the EURO
>> SIGN has a codepoint 0x20ac, but your code is assuming 0x80 somehow. I
>> think the latin1 is wrong. Things work for me if I use cp1252 throughout.
>>
>> I have added the euro as a predefined entity. That will at least allow
>> you to do €
>
>
> Hi Robin,
>
> how about adding this little helper function?
>
> import types
> def toUnicode(s):
> if type(s) is not types.UnicodeType:
> s = unicode(s, "utf8")
> return s.replace(u"\x80", u"\u20ac")
>
> The Euro sign in my examples comes directly from my German Windows
> keyboard and worked fine until now in other apps. Maybe it's a bug in
> Python?
>
> Bye, Dirk
>
......
It's not a bug in python. You explicitly used latin1. When we decode the string
unicode("€",'latin1')--> u'\x80', but that is not recognized as a proper unicode
point by the Python codecs stuff. According to wikipaedia
http://en.wikipedia.org/wiki/Western_Latin_character_sets_%28computing%29
The euro isn't defined in latin1, but because the latin1 codec maps the 0x80 -->
0x80 we see the wrong unicode. My claim is that your windows is using an
encoding which maps the euro sign to 0x80 and displays it as such, but that
encoding is not latin1.
My suggestion of cp1252 may not be what you're actually using, but that at least
does the right thing for the euro and many other characters.
--
Robin Becker
More information about the reportlab-users
mailing list