[reportlab-users] Font mappings, encodings, Macs - r1.20

Andy Robinson reportlab-users@reportlab.com
Fri, 28 May 2004 17:09:55 +0100


> Wouldn't it be simpler to allow the use of Unicode strings rather than
> deal with charset conversions?  With Unicode string objects a number of
> risk are eliminated, e.g. you won't accidentally pass in a Latin-1
> string in a place that expects UTF-8 or vice versa.  And the change
> shouldn't break existing apps as they do not use Unicode.  Also, some
> other problems with multibyte encodings (like wrapping a line in the
> middle of a UTF-8 character) magically disappear if you start
> manipulating Unicode instead.

The non-TrueType fonts al have 8-bit encodings, so we have
to convert internally even if we only accept unicode as input.

> 
> Are there any advantages in dealing with 8-bit strings over Unicode
> strings that I am missing?

My feeling is that we should "allow" but not "require" Unicode
strings.  A Japanese user with shift-jis data plus a few
in-house extensions, and shift-jis fonts (plus the same few in-house
extensions) risks corruption in obscure corner cases if their data is
forced through a Unicode conversion, but will be quite happy if
the 'straight through' path is preserved.  It still happens!

I think that at some future date we should have two default inputs:
if it's a Unicode string, it's unicode, and if it's an 8-bit
string it's assumed to be utf8.  But on the first release there
will be some 'autoConvertEncoding' flag to enable/disable the 
behaviour.

Thanks,

Andy

p.s. I'm away for one week, I look forward to the consensus
and/or wish-list when I get back!