[reportlab-users] Unicode pyRXP and malloc problems in pyRXP

Stuart Bishop reportlab-users@reportlab.com
Mon, 10 Feb 2003 17:51:58 +1100


--Apple-Mail-2--19274905
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset=US-ASCII;
	format=flowed

I've been tapping away a bit and put together a Unicode aware version 
of pyRXP (called uRXP) that will happily coexist in the same 
distribution as pyRXP (I'll submit a patch when I'm done, unless 
someone gives me CVS access first):

 >>> import pyRXP
 >>> xml = '<phrase>Luv U long time &#xA3;5</phrase>'
 >>> pyRXP.Parser().parse(xml)
('phrase', None, ['Luv U long time \xa35'], None)
 >>> xml = '<phrase>I &#x2764; my poodle</phrase>'
 >>> pyRXP.Parser().parse(xml)
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
pyRXP.Error: Error: 0x2764 is not a valid 8-bit XML character
  in unnamed entity at line 1 char 19 of [unknown]
error return=1
0x2764 is not a valid 8-bit XML character
Parse Failed!

 >>> import uRXP
 >>> uRXP.Parser().parse(xml)
(u'phrase', None, [u'I \u2764 my poodle'], None)
 >>>

This version runs RXP in 16 bit mode. Currently over twice as slow on 
parsing.
pyRXP: init 0.0100, parse 0.2100, traverse 0.0800
uRXP: init 0.0000, parse 0.4900, traverse 0.0800

While putting together test suites for this, I'm finding cases where
I'm getting malloc warnings when parsing some sample xml documents
under OS X (which I can repeat with an unmodified pyRXP 0.9). I don't
know if it is an OS X specific issue, or just an issue that is only
warned about under OS X:

Python 2.2.2 (#1, 02/08/03, 12:02:49)
[GCC Apple cpp-precomp 6.14] on darwin
Type "help", "copyright", "credits" or "license" for more information.
 >>> import pyRXP
 >>> p = pyRXP.Parser()
 >>> p.parse(open('001.xml').read())
*** malloc[19496]: Deallocation of a pointer not malloced: 0x1a4450; 
This could be a double free(), or free() called with the middle of an 
allocated block; Try setting environment variable MallocHelp to see 
tools to help debug
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
pyRXP.Error: Error: EOE in comment
  in entity "e" defined at line 2 char 1 of 
file:///Users/zen/src/pyRXP-cvs/test/xmltest/invalid/001.ent
  in unnamed entity at line 3 char 4 of 
file:///Users/zen/src/pyRXP-cvs/test/xmltest/invalid/001.ent
error return=1
EOE in comment
Parse Failed!

The parse failing is correct. The problem is the 'deallocation
of a pointer not malloced' bug.

I'm pretty sure this is a pyRXP issue rather than a RXP issue, as
running rxp over the file doesn't generate a warning (although I guess
this might be due to differences in the build process between Python and
rxp).

I'll try and track this down, but I'd appreciate any pointers on fixing 
this sort of thing.

I've attached 001.xml and 001.ent.


--Apple-Mail-2--19274905
Content-Disposition: attachment;
	filename=001.xml
Content-Transfer-Encoding: quoted-printable
Content-Type: application/octet-stream;
	x-unix-mode=0664;
	name="001.xml"

<!DOCTYPE=20doc=20SYSTEM=20"001.ent">=0D=0A<doc></doc>=0D=0A=

--Apple-Mail-2--19274905
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset=US-ASCII;
	format=flowed




--Apple-Mail-2--19274905
Content-Disposition: attachment;
	filename=001.ent
Content-Transfer-Encoding: quoted-printable
Content-Type: application/octet-stream;
	x-unix-mode=0664;
	name="001.ent"

<!ELEMENT=20doc=20EMPTY>=0D=0A<!ENTITY=20%=20e=20"<!--">=0D=0A%e;=20-->=0D=
=0A=

--Apple-Mail-2--19274905
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset=US-ASCII;
	format=flowed



-- 
Stuart Bishop <zen@shangri-la.dropbear.id.au>
http://shangri-la.dropbear.id.au/


--Apple-Mail-2--19274905--