[reportlab-users] V2 bullet
John J. Lee
jjlee at reportlab.com
Mon Jul 24 08:29:35 EDT 2006
On Mon, 24 Jul 2006, Mike Dewhirst wrote:
[...]
> Prof Google led me up many dark and twisty alleys and that led me to think
> I'm not really sure I want to allocate brain space to the innards of unicode.
> But I do want to be able translate various symbols in various to codes which
> I then want to make Python emit.
>
> Can you suggest any links or directions to reading matter for people (ie me)
> with short attention spans?
There's a certain minimum you should really know, which is not difficult
to get your head around. You need to know what a codepoint is, and what
an encoding is, for example. Here's a good primer:
http://www.joelonsoftware.com/articles/Unicode.html
Then you need to know how that applies to Python:
http://effbot.org/zone/unicode-objects.htm
When you've absorbed that, rummage through the unicode code charts to find
either the codepoint for the character you want, or the long "friendly
name" of the character (you can also use Windows' "Character Map"
utility):
http://www.unicode.org/charts/
Then, to include that character in a Python string, either use one of the
unicode string escapes to type in the character, or encode the codepoint
to the encoding you're using (e.g. UTF-8). e.g. let's say I want the
character R with a circle around it (the registered trademark sign). The
codepoint for that is 174 (or AE in hexadecimal). Here's several
different ways of writing that, first as a Python unicode string object
(all these mean the same thing):
u"\N{Registered Sign} ReportLab Europe Ltd."
u'\xae ReportLab Europe Ltd.'
u'\u00ae ReportLab Europe Ltd.'
u'\U000000ae ReportLab Europe Ltd.'
The first uses the "friendly name" I mentioned above, the others use the
codepoint.
Or you can write it as a regular string object encoded in UTF-8:
'\xc2\xae ReportLab Europe Ltd.'
How did I get the \xc2\xae at the start of that last example? I just used
Python to encode to UTF-8 from a unicode string I typed in:
>>> u"\N{Registered Sign}".encode("utf-8")
'\xc2\xae'
Here's the Python docs on these escape codes:
http://docs.python.org/ref/strings.html
> If not, I'll settle for dark and twisty stuff :)
There's lots of dark and twisty stuff about collation, normalisation, etc.
etc. etc., but most people get away with ignoring it all ;-)
John
More information about the reportlab-users
mailing list