[reportlab-users] pyRXP vs ONIX DTD
Thu, 5 Dec 2002 14:20:26 +1100
I'm trying to use pyRXP to validate ONIX documents that I am
generating. However, I am getting lots of 'not a valid 8-bit XML
character' warnings unless I set the IgnoreEntities flag to true. The
ONIX DTD looks fine to me, although I'm no expert. The first character
that is picked up is "Œ" , which seems valid to my cursory reading
of the XML 1.0 spec.
Can anyone confirm if this is a problem with the ONIX DTD, or a bug or
limitation of the RXP engine being used by pyRXP? Similar issues appear
to have been raised in the past with regard to Docbook, with the
solution being to build RXP with unicode support.
I'd guess that the DTD is being retrieved by the C engine, so would
have no bearing on Python's Unicode support. I'd really like to be able
to validate with maximum paranoia, as I'm generating many ONIX records
from untrusted source data.
ONIX = u'''<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE ONIXMessage SYSTEM
And the output:
Traceback (most recent call last):
File "bug.py", line 7, in ?
pyRXP.Error: Error: 0x152 is not a valid 8-bit XML character
in entity "xhtml-special" at line 33 char 25 of
in entity "MainModule" at line 2059 char 16 of
in unnamed entity at line 625 char 13 of
0x152 is not a valid 8-bit XML character
Stuart Bishop <email@example.com>