[reportlab-users] TTF, Platypus & Unicode...

Marius Gedminas reportlab-users@reportlab.com
Tue, 30 Jul 2002 12:30:02 +0200


On Tue, Jul 30, 2002 at 11:23:19AM +0200, Dinu Gherman wrote:
> I'm trying to reuse TrueType fonts in Platypus and find myself
> diving deep into Unicode adventures. Can someone explain how to
> convert what I suppose to be Latin-1 into UTF-8 (I thought that
> would be the same, but maybe not for TrueType fonts...)?

Latin-1 is very different from UTF-8.  (Actually the encoding of the
first 128 characters matches, but the rest is different.)  If you have
Python 1.6 or later you can use stuff like

  utf8_string = unicode(latin1_string, "ISO-8859-1").encode("UTF-8")

I'm not sure if there's a simpler way to do that, or if Python 1.5 has
any functions to help you work with UTF-8.  It shouldn't be difficult to
write a conversion routine from Latin-1 to UTF-8:

  if 0x00 <= char < 0x80:   output char
  if 0x80 <= char < 0xBF:   output 0xC2, then char
  if 0xC0 <= char < 0x100:  output 0xC3, then (char - 64)

See `man utf-8' if you have a Linux machine nearby.  Another good
reference is http://www.cl.cam.ac.uk/~mgk25/unicode.html

It would not be hard to enable the direct usage of Latin-1 texts with
TTF fonts but I do not want to do that on principle.  Let the user think
about encodings instead of limiting the choices to an arbitrary subset.
(Besides it might be too easy to confuse Latin-1 with Windows-1252.)

Marius Gedminas
-- 
Given enough eyeballs all bugs are shallow.
		-- Eric S. Raymond, "The Cathedral and the Bazaar"