[reportlab-users] Font mappings, encodings, Macs - r1.20
Andy Robinson
reportlab-users@reportlab.com
Fri, 28 May 2004 17:09:55 +0100
> Wouldn't it be simpler to allow the use of Unicode strings rather than
> deal with charset conversions? With Unicode string objects a number of
> risk are eliminated, e.g. you won't accidentally pass in a Latin-1
> string in a place that expects UTF-8 or vice versa. And the change
> shouldn't break existing apps as they do not use Unicode. Also, some
> other problems with multibyte encodings (like wrapping a line in the
> middle of a UTF-8 character) magically disappear if you start
> manipulating Unicode instead.
The non-TrueType fonts al have 8-bit encodings, so we have
to convert internally even if we only accept unicode as input.
>
> Are there any advantages in dealing with 8-bit strings over Unicode
> strings that I am missing?
My feeling is that we should "allow" but not "require" Unicode
strings. A Japanese user with shift-jis data plus a few
in-house extensions, and shift-jis fonts (plus the same few in-house
extensions) risks corruption in obscure corner cases if their data is
forced through a Unicode conversion, but will be quite happy if
the 'straight through' path is preserved. It still happens!
I think that at some future date we should have two default inputs:
if it's a Unicode string, it's unicode, and if it's an 8-bit
string it's assumed to be utf8. But on the first release there
will be some 'autoConvertEncoding' flag to enable/disable the
behaviour.
Thanks,
Andy
p.s. I'm away for one week, I look forward to the consensus
and/or wish-list when I get back!