[reportlab-users] Large tables - exponentially increasing run times

Robin Becker reportlab-users@reportlab.com
Sun, 16 May 2004 10:21:33 +0100


Craig Ringer wrote:
> On Sat, 2004-05-15 at 06:33, Robin Becker wrote:
> 
> 
>>hi indeed. I hope your efforts were independent and not duplicated. The 
>>longtable stuff was rolled in prior to 1.19. There should be a class
>>
>>class LongTable(Table):
>>	'''Henning von Bargen's changes will be active'''
>>	_longTableOptimize = 1
> 
> 
> I saw that. It seems to tweak the behaviour of _calc_height. If one is
> able to pass a pre-built rowHeights to the Table class, that change
> doesn't really do anything.
> 
> 
>>Certainly there's an example at the end of tables.py which has 5000 
>>rows. Please try it out and report if it is still hopeless.
> 
> 
> I tried it before, but thought I'd double check to be sure. I had to add
> LongTable as an export from platypus/__init__.py so I wasn't too sure it
> was meant as an externally accessible class, despite it looking like
> one.
> 
> I'm finding LongTable to be _slower_ than Table if I specify rowHeights
> and colWidths for Table. LongTable only permits colWidths.
> 
> Here are some example run times. tt.py is my basic test script, ttl uses
> the LongTable class instead and doesn't pass a rowHeights parameter. The
> first arg is the output PDF, the second is the number of times to repeat
> the built-in 10-row dummy array.
> 
> [craig@rasputin bench]$ time ./tt.py x 100
> 
> real    0m1.926s
> user    0m1.832s
> sys     0m0.057s
> [craig@rasputin bench]$ time ./tt.py x 200
> 
> real    0m5.332s
> user    0m5.187s
> sys     0m0.061s
> [craig@rasputin bench]$ time ./tt.py x 400
> 
> real    0m18.277s
> user    0m17.501s
> sys     0m0.106s
> 
> [craig@rasputin bench]$ time ./ttl.py x 100
> 
> real    0m2.689s
> user    0m2.585s
> sys     0m0.063s
> [craig@rasputin bench]$ time ./ttl.py x 200
> 
> real    0m7.382s
> user    0m7.210s
> sys     0m0.058s
> [craig@rasputin bench]$ time ./ttl.py x 400
> 
> real    0m24.752s
> user    0m24.142s
> sys     0m0.102s
> 
> So ... LongTable doesn't seem to be the solution to my trouble, though
> it's definitely _vastly_ faster than Table without a passed rowHeights
> param.
> 
> John Pywtorak's suggestion of using many smaller tables sounds like a
> practical workaround that'd make it practical to use platypus for my
> purposes.
> 
> Still, that's what it seems like - a workaround. It strikes me as
> somewhat clumsy compared to the really clean nice most of platypus
> works, and looks to me like it'd be a right pain to get working with
> repeatRows. I can certainly make it work, but if there's any practical
> way the performance of the Table class can be improved for larger row
> counts I'd be interested in helping out.
> 
> I'll have a try at writing a Table subclass that splits off a chunk of
> the data and re-uses the parent on each split, if you folks think it'd
> be a reasonable approach.
> 
> Craig Ringer

I don't think any general purpose table is as fast as it can be for 
relatively fixed layout. Henning's long table was intended to address 
the problem where all rows are used to calculate height's and widths 
when the split could be detected more quickly; if the computed widths 
are resued teher's obviously a saving. I've probably messed that up 
somehow. The row layout problem can be speeded up, but there are a lot 
of special cases. I count various row kinds

start rows
normal rows
end rows
start after a split rows
end before a split rows

and various desires such as not having widow or orphan rows. If a table 
is 10 pages long do we want to start it at halfway down a page etc etc?
-- 
Robin Becker