[reportlab-users] pyRXP vs ONIX DTD
Stuart Bishop
reportlab-users@reportlab.com
Thu, 5 Dec 2002 14:20:26 +1100
I'm trying to use pyRXP to validate ONIX documents that I am
generating. However, I am getting lots of 'not a valid 8-bit XML
character' warnings unless I set the IgnoreEntities flag to true. The
ONIX DTD looks fine to me, although I'm no expert. The first character
that is picked up is "Œ" , which seems valid to my cursory reading
of the XML 1.0 spec.
Can anyone confirm if this is a problem with the ONIX DTD, or a bug or
limitation of the RXP engine being used by pyRXP? Similar issues appear
to have been raised in the past with regard to Docbook, with the
solution being to build RXP with unicode support.
I'd guess that the DTD is being retrieved by the C engine, so would
have no bearing on Python's Unicode support. I'd really like to be able
to validate with maximum paranoia, as I'm generating many ONIX records
from untrusted source data.
Minimal example:
import pyRXP
ONIX = u'''<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE ONIXMessage SYSTEM
"http://www.editeur.org/onix/2.0/reference/onix-international.dtd">
<ONIXMessage></ONIXMessage>
'''
pyRXP.Parser().parse(ONIX)
And the output:
Traceback (most recent call last):
File "bug.py", line 7, in ?
pyRXP.Parser().parse(ONIX)
pyRXP.Error: Error: 0x152 is not a valid 8-bit XML character
in entity "xhtml-special" at line 33 char 25 of
http://www.editeur.org/onix/2.0/reference/xhtml-special.ent
in entity "MainModule" at line 2059 char 16 of
http://www.editeur.org/onix/2.0/reference/onix-international.elt
in unnamed entity at line 625 char 13 of
http://www.editeur.org/onix/2.0/reference/onix-international.dtd
error return=1
0x152 is not a valid 8-bit XML character
Parse Failed!
--
Stuart Bishop <zen@shangri-la.dropbear.id.au>
http://shangri-la.dropbear.id.au/