[reportlab-users] pyRXP vs ONIX DTD

Robin Becker reportlab-users@reportlab.com
Thu, 5 Dec 2002 18:43:03 +0000


In article <20021205121722.GA27836@codeworks.lt>, Marius Gedminas
<marius@codeworks.lt> writes
........
>
>The problem is character entities like '&#1234;', right?  So why not just
>translate them to UTF-8 strings instead of throwing exceptions?  That's
>assuming pyRXP works with UTF-8 internally; I'm not familiar with it,
>and probably should abstain from discussing things I know very little
>about.  I only wanted to clarify that &#9999; is OK even when the
>document is declared as <?xml version="1.0" encoding="ISO-8859-1"?>.
>
>See http://www.w3.org/TR/REC-xml#dt-charref
>
>> pyRXP is open source so anybody could try and switch it to 16 bit.
>
>(I'm not sure that's a good idea; Unicode is 20.1 bits wide, and UTF-16
>combines all the disadvantages of both UTF-8 and UTF-32.)
>
>Marius Gedminas
I'm not sure I understand this really as I'm not an expert on encodings.
And certainly not the author of RXP.

Are you saying there's a unique utf-8 version of these 16bit things? I
had thought there were some problems. Is the byte sequence always
defined bigendian/littleendian?
-- 
Robin Becker