[reportlab-users] Large tables - exponentially increasing run
times
Robin Becker
reportlab-users@reportlab.com
Sun, 16 May 2004 10:21:33 +0100
Craig Ringer wrote:
> On Sat, 2004-05-15 at 06:33, Robin Becker wrote:
>
>
>>hi indeed. I hope your efforts were independent and not duplicated. The
>>longtable stuff was rolled in prior to 1.19. There should be a class
>>
>>class LongTable(Table):
>> '''Henning von Bargen's changes will be active'''
>> _longTableOptimize = 1
>
>
> I saw that. It seems to tweak the behaviour of _calc_height. If one is
> able to pass a pre-built rowHeights to the Table class, that change
> doesn't really do anything.
>
>
>>Certainly there's an example at the end of tables.py which has 5000
>>rows. Please try it out and report if it is still hopeless.
>
>
> I tried it before, but thought I'd double check to be sure. I had to add
> LongTable as an export from platypus/__init__.py so I wasn't too sure it
> was meant as an externally accessible class, despite it looking like
> one.
>
> I'm finding LongTable to be _slower_ than Table if I specify rowHeights
> and colWidths for Table. LongTable only permits colWidths.
>
> Here are some example run times. tt.py is my basic test script, ttl uses
> the LongTable class instead and doesn't pass a rowHeights parameter. The
> first arg is the output PDF, the second is the number of times to repeat
> the built-in 10-row dummy array.
>
> [craig@rasputin bench]$ time ./tt.py x 100
>
> real 0m1.926s
> user 0m1.832s
> sys 0m0.057s
> [craig@rasputin bench]$ time ./tt.py x 200
>
> real 0m5.332s
> user 0m5.187s
> sys 0m0.061s
> [craig@rasputin bench]$ time ./tt.py x 400
>
> real 0m18.277s
> user 0m17.501s
> sys 0m0.106s
>
> [craig@rasputin bench]$ time ./ttl.py x 100
>
> real 0m2.689s
> user 0m2.585s
> sys 0m0.063s
> [craig@rasputin bench]$ time ./ttl.py x 200
>
> real 0m7.382s
> user 0m7.210s
> sys 0m0.058s
> [craig@rasputin bench]$ time ./ttl.py x 400
>
> real 0m24.752s
> user 0m24.142s
> sys 0m0.102s
>
> So ... LongTable doesn't seem to be the solution to my trouble, though
> it's definitely _vastly_ faster than Table without a passed rowHeights
> param.
>
> John Pywtorak's suggestion of using many smaller tables sounds like a
> practical workaround that'd make it practical to use platypus for my
> purposes.
>
> Still, that's what it seems like - a workaround. It strikes me as
> somewhat clumsy compared to the really clean nice most of platypus
> works, and looks to me like it'd be a right pain to get working with
> repeatRows. I can certainly make it work, but if there's any practical
> way the performance of the Table class can be improved for larger row
> counts I'd be interested in helping out.
>
> I'll have a try at writing a Table subclass that splits off a chunk of
> the data and re-uses the parent on each split, if you folks think it'd
> be a reasonable approach.
>
> Craig Ringer
I don't think any general purpose table is as fast as it can be for
relatively fixed layout. Henning's long table was intended to address
the problem where all rows are used to calculate height's and widths
when the split could be detected more quickly; if the computed widths
are resued teher's obviously a saving. I've probably messed that up
somehow. The row layout problem can be speeded up, but there are a lot
of special cases. I count various row kinds
start rows
normal rows
end rows
start after a split rows
end before a split rows
and various desires such as not having widow or orphan rows. If a table
is 10 pages long do we want to start it at halfway down a page etc etc?
--
Robin Becker