[reportlab-users] pyRXP vs ONIX DTD
Marius Gedminas
reportlab-users@reportlab.com
Thu, 5 Dec 2002 11:59:51 +0200
On Thu, Dec 05, 2002 at 09:25:03AM +0000, Robin Becker wrote:
> ...... Well I think the error message says it all. The document says
> it's utf-8 and then tries to expand a non-8 bit char. I suppose we
> have to say that's impossible.
>
> >>> pyRXP.Parser()("<a>ÿ</a>")
> ('a', None, ['\xff'], None)
> >>> pyRXP.Parser()("<a>Ā</a>")
> Traceback (most recent call last):
> File "<interactive input>", line 1, in ?
> Error: Error: 0x100 is not a valid 8-bit XML character
> in unnamed entity at line 1 char 10 of [unknown]
Um, as far as I remember XML, numeric character entities are independent
from the document's charset and are always interpreted as Unicode
character codes.
> I'm not an expert on XML that's to big and vague a subject for a simple
> person like myself. The poor parser is in 8 bit only at present so it
> will not handle entity declarations like
>
> <!ENTITY bdquo "„"> <!-- double low-9 quotation mark, U+201E NEW -->
Then it's not an XML parser, is it?
> I assume there is some universal encoding in which this is understandable.
Unicode (internal representation doesn't matter -- UTF-8, UTF-16,
UTF-32...).
It's been some time since I last read XML specs, so I could be very
wrong, but I seem to remember that XML is basically inseparable from
Unicode.
HTH,
Marius Gedminas
--
Hanlon's Razor:
Never attribute to malice that which is adequately explained
by stupidity.