[reportlab-users] pyRXP vs ONIX DTD
Thu, 5 Dec 2002 14:17:22 +0200
On Thu, Dec 05, 2002 at 10:25:59AM +0000, Robin Becker wrote:
> In article <20021205095951.GA26876@codeworks.lt>, Marius Gedminas
> <firstname.lastname@example.org> writes
> >It's been some time since I last read XML specs, so I could be very
> >wrong, but I seem to remember that XML is basically inseparable from
> >Marius Gedminas
> You're probably right, but that's why we have utf-8 ie an 8 bit
> encoding. The right thing to do is to switch this to 16 bit or perhaps
> 32 bit or 64 bit, handle all known BOM's and then watch paint dry. Eight
> bit encodings were always sufficient, but modernists want to use up all
> their new computing power :).
The problem is character entities like 'Ӓ', right? So why not just
translate them to UTF-8 strings instead of throwing exceptions? That's
assuming pyRXP works with UTF-8 internally; I'm not familiar with it,
and probably should abstain from discussing things I know very little
about. I only wanted to clarify that ✏ is OK even when the
document is declared as <?xml version="1.0" encoding="ISO-8859-1"?>.
> pyRXP is open source so anybody could try and switch it to 16 bit.
(I'm not sure that's a good idea; Unicode is 20.1 bits wide, and UTF-16
combines all the disadvantages of both UTF-8 and UTF-32.)
If you can't understand it, it is intuitively obvious.