[reportlab-users] pyRXP vs ONIX DTD
Fri, 6 Dec 2002 10:08:24 +0200
On Thu, Dec 05, 2002 at 06:43:03PM +0000, Robin Becker wrote:
> >> pyRXP is open source so anybody could try and switch it to 16 bit.
> >(I'm not sure that's a good idea; Unicode is 20.1 bits wide, and UTF-16
> >combines all the disadvantages of both UTF-8 and UTF-32.)
> >Marius Gedminas
> I'm not sure I understand this really as I'm not an expert on encodings.
> And certainly not the author of RXP.
> Are you saying there's a unique utf-8 version of these 16bit things?
UTF-8 can technically express codes from 0 to 2^31 - 1, although
currently both Unicode and ISO 10646 limit their codespaces to
> had thought there were some problems. Is the byte sequence always
> defined bigendian/littleendian?
UTF-8 is endian-independent, that's one of its advantages (UTF-16 and
UTF-32, and also UCS-2 and UCS-4 come in two flavours -- big-endian and
little-endian. BTW, UCS-4 is identical to UTF-32, and UCS-2 differs
from UTF-16 in that UTF-16 can express characters above 0xffff using
surrogate pairs, while UCS-2 is limited to Basic Multilingual Plane
Aren't encodings fun? It's almost as bad as understanding time
measurements (leap seconds, oh my...).
Those parts of the system that you can hit with a hammer (not advised)
are called hardware; those program instructions that you can only curse
at are called software.
-- Levitating Trains and Kamikaze Genes: Technological
Literacy for the 1990's.