[reportlab-users] Re: quick hack on para.py

Wed Jul 5 08:16:42 EDT 2006

Dirk Holtwick wrote:
> Robin Becker schrieb:
>> the important one (at least for most Europeans) grrhhhhhh. I looked this
>> up. According to
>> http://www.unicode.org/Public/MAPPINGS/VENDORS/ADOBE/symbol.txt the EURO
>> SIGN has a codepoint 0x20ac, but your code is assuming 0x80 somehow. I
>> think the latin1 is wrong. Things work for me if I use cp1252 throughout.
>>
>> I have added the euro as a predefined entity. That will at least allow
>> you to do  &euro;
> 
> 
> Hi Robin,
> 
> how about adding this little helper function?
> 
> import types
> def toUnicode(s):
>     if type(s) is not types.UnicodeType:
>         s = unicode(s, "utf8")
>     return s.replace(u"\x80", u"\u20ac")
> 
> The Euro sign in my examples comes directly from my German Windows
> keyboard and worked fine until now in other apps. Maybe it's a bug in
> Python?
> 
> Bye, Dirk
> 
......
It's not a bug in python. You explicitly used latin1. When we decode the string

unicode("€",'latin1')--> u'\x80', but that is not recognized as a proper unicode 
point by the Python codecs stuff. According to wikipaedia

http://en.wikipedia.org/wiki/Western_Latin_character_sets_%28computing%29

The euro isn't defined in latin1, but because the latin1 codec maps the 0x80 --> 
0x80 we see the wrong unicode. My claim is that your windows is using an 
encoding which maps the euro sign to 0x80 and displays it as such, but that 
encoding is not latin1.

My suggestion of cp1252 may not be what you're actually using, but that at least 
does the right thing for the euro and many other characters.
-- 
Robin Becker