[reportlab-users] V2 bullet

John J. Lee jjlee at reportlab.com
Mon Jul 24 08:29:35 EDT 2006


On Mon, 24 Jul 2006, Mike Dewhirst wrote:
[...]
> Prof Google led me up many dark and twisty alleys and that led me to think 
> I'm not really sure I want to allocate brain space to the innards of unicode. 
> But I do want to be able translate various symbols in various to codes which 
> I then want to make Python emit.
>
> Can you suggest any links or directions to reading matter for people (ie me) 
> with short attention spans?

There's a certain minimum you should really know, which is not difficult 
to get your head around.  You need to know what a codepoint is, and what 
an encoding is, for example.  Here's a good primer:

http://www.joelonsoftware.com/articles/Unicode.html


Then you need to know how that applies to Python:

http://effbot.org/zone/unicode-objects.htm


When you've absorbed that, rummage through the unicode code charts to find 
either the codepoint for the character you want, or the long "friendly 
name" of the character (you can also use Windows' "Character Map" 
utility):

http://www.unicode.org/charts/


Then, to include that character in a Python string, either use one of the 
unicode string escapes to type in the character, or encode the codepoint 
to the encoding you're using (e.g. UTF-8).  e.g. let's say I want the 
character R with a circle around it (the registered trademark sign).  The 
codepoint for that is 174 (or AE in hexadecimal).  Here's several 
different ways of writing that, first as a Python unicode string object 
(all these mean the same thing):

u"\N{Registered Sign} ReportLab Europe Ltd."
u'\xae ReportLab Europe Ltd.'
u'\u00ae ReportLab Europe Ltd.'
u'\U000000ae ReportLab Europe Ltd.'


The first uses the "friendly name" I mentioned above, the others use the 
codepoint.

Or you can write it as a regular string object encoded in UTF-8:

'\xc2\xae ReportLab Europe Ltd.'


How did I get the \xc2\xae at the start of that last example?  I just used 
Python to encode to UTF-8 from a unicode string I typed in:

>>> u"\N{Registered Sign}".encode("utf-8")
'\xc2\xae'


Here's the Python docs on these escape codes:

http://docs.python.org/ref/strings.html


> If not, I'll settle for dark and twisty stuff :)

There's lots of dark and twisty stuff about collation, normalisation, etc. 
etc. etc., but most people get away with ignoring it all ;-)


John


More information about the reportlab-users mailing list