[reportlab-users] memory leak in reportlab xmllib.FastXMLParser

Andy Robinson andy at reportlab.com
Wed Dec 18 12:28:50 EST 2013


Hi Mirko (and everyone),

I suggest it's not worth too much time on this.

The last stage of our port to python 3.3 compatibility is for us to
change the ParaParser to work on something available in python 2.7 and
3.3 and get rid of sgmlop/xmllib forever. I think we will be there
within two weeks.

The problem is that we need to parse a lot of little chunks of text,
and the available C based parsers need some expensive setup (e.g. to
set up all the entities), then we loop over them in Python anyway.
After various speed experiments, I have concluded that there is no
performance benefit to messing around with expat/etree/lxml/pyRXP, so
I'm currently trying to rewrite paraparser.py using the html.parser in
Python's standard library. This will allow us to be fairly tolerant
of poor markup, and to initialize a parser object quickly. And we
can get rid of sgmlop/xmllib forever. I would hope that a parser in
the standard library is leak free; if not at least it's Somebody
Else's Problem ;-)

Once we get this done, we hope to 'juggle branches' so that the
default code is running the new paraparser and work towards a release
in January or early February.

- Andy


On 17 December 2013 16:14, Mirko Dziadzka <mirko.dziadzka at gmail.com> wrote:

> Hi

>

> I’m not sure if this is the right list for a bug report, any pointers to another address are welcome.

>

> Problem description

> ===============

>

> The following program creates a memory leak.

>

> As a result, using reportlab.platypus.paraparser.ParaParser creates a memory leak too.

> As a result, wordaxe has a memory leak (my original problem)

>

>

> # show memory leak

> import gc

> from reportlab.lib import xmllib

>

> assert xmllib.sgmlop # check that we are using FastXMLParser

>

> while True:

> parser = xmllib.XMLParser(verbose=0)

> parser.close()

> gc.collect()

>

>

> How to reproduce

> ==============

>

> Just start this program and watch the memory going up … see the 6th column in the ps output below

>

> $ while sleep 10 ; do ps auxww | grep python | grep -v grep ; done

> mirko 1023 100,0 0,3 2467680 25608 s000 R+ 5:10pm 1:00.44 python t.py

> mirko 1023 100,0 0,3 2469728 27424 s000 R+ 5:10pm 1:10.47 python t.py

> mirko 1023 99,3 0,3 2471520 29048 s000 R+ 5:10pm 1:20.50 python t.py

> mirko 1023 100,0 0,4 2472288 30616 s000 R+ 5:10pm 1:30.54 python t.py

>

>

> I tested this with reportlab-2.5 and reportlab-2.7 on CentOS-6-64bit and MacOS 10.8 with Python 2.7 and Python 2.6

>

> Analysis

> =======

>

> It seems that there is a cyclic reference between FastXMLParser and sgmlop and parser.close() is not cleaning up.

>

> Using the SlowXMLParser instead of XMLParser is working fine.

>

>

> _______________________________________________

> reportlab-users mailing list

> reportlab-users at lists2.reportlab.com

> http://two.pairlist.net/mailman/listinfo/reportlab-users




--
Andy Robinson
Managing Director
ReportLab Europe Ltd.
Thornton House, Thornton Road, Wimbledon, London SW19 4NG, UK
Tel +44-20-8405-6420


More information about the reportlab-users mailing list