[reportlab-users] Unicode pyRXP and malloc problems in pyRXP
Stuart Bishop
reportlab-users@reportlab.com
Mon, 10 Feb 2003 17:51:58 +1100
--Apple-Mail-2--19274905
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
format=flowed
I've been tapping away a bit and put together a Unicode aware version
of pyRXP (called uRXP) that will happily coexist in the same
distribution as pyRXP (I'll submit a patch when I'm done, unless
someone gives me CVS access first):
>>> import pyRXP
>>> xml = '<phrase>Luv U long time £5</phrase>'
>>> pyRXP.Parser().parse(xml)
('phrase', None, ['Luv U long time \xa35'], None)
>>> xml = '<phrase>I ❤ my poodle</phrase>'
>>> pyRXP.Parser().parse(xml)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
pyRXP.Error: Error: 0x2764 is not a valid 8-bit XML character
in unnamed entity at line 1 char 19 of [unknown]
error return=1
0x2764 is not a valid 8-bit XML character
Parse Failed!
>>> import uRXP
>>> uRXP.Parser().parse(xml)
(u'phrase', None, [u'I \u2764 my poodle'], None)
>>>
This version runs RXP in 16 bit mode. Currently over twice as slow on
parsing.
pyRXP: init 0.0100, parse 0.2100, traverse 0.0800
uRXP: init 0.0000, parse 0.4900, traverse 0.0800
While putting together test suites for this, I'm finding cases where
I'm getting malloc warnings when parsing some sample xml documents
under OS X (which I can repeat with an unmodified pyRXP 0.9). I don't
know if it is an OS X specific issue, or just an issue that is only
warned about under OS X:
Python 2.2.2 (#1, 02/08/03, 12:02:49)
[GCC Apple cpp-precomp 6.14] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyRXP
>>> p = pyRXP.Parser()
>>> p.parse(open('001.xml').read())
*** malloc[19496]: Deallocation of a pointer not malloced: 0x1a4450;
This could be a double free(), or free() called with the middle of an
allocated block; Try setting environment variable MallocHelp to see
tools to help debug
Traceback (most recent call last):
File "<stdin>", line 1, in ?
pyRXP.Error: Error: EOE in comment
in entity "e" defined at line 2 char 1 of
file:///Users/zen/src/pyRXP-cvs/test/xmltest/invalid/001.ent
in unnamed entity at line 3 char 4 of
file:///Users/zen/src/pyRXP-cvs/test/xmltest/invalid/001.ent
error return=1
EOE in comment
Parse Failed!
The parse failing is correct. The problem is the 'deallocation
of a pointer not malloced' bug.
I'm pretty sure this is a pyRXP issue rather than a RXP issue, as
running rxp over the file doesn't generate a warning (although I guess
this might be due to differences in the build process between Python and
rxp).
I'll try and track this down, but I'd appreciate any pointers on fixing
this sort of thing.
I've attached 001.xml and 001.ent.
--Apple-Mail-2--19274905
Content-Disposition: attachment;
filename=001.xml
Content-Transfer-Encoding: quoted-printable
Content-Type: application/octet-stream;
x-unix-mode=0664;
name="001.xml"
<!DOCTYPE=20doc=20SYSTEM=20"001.ent">=0D=0A<doc></doc>=0D=0A=
--Apple-Mail-2--19274905
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
format=flowed
--Apple-Mail-2--19274905
Content-Disposition: attachment;
filename=001.ent
Content-Transfer-Encoding: quoted-printable
Content-Type: application/octet-stream;
x-unix-mode=0664;
name="001.ent"
<!ELEMENT=20doc=20EMPTY>=0D=0A<!ENTITY=20%=20e=20"<!--">=0D=0A%e;=20-->=0D=
=0A=
--Apple-Mail-2--19274905
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
format=flowed
--
Stuart Bishop <zen@shangri-la.dropbear.id.au>
http://shangri-la.dropbear.id.au/
--Apple-Mail-2--19274905--