[reportlab-users] Large tables - exponentially increasing run times

Craig Ringer reportlab-users@reportlab.com
Sun, 16 May 2004 16:01:39 +0800


--=-u3vTSocCnwAgXkWc/MkU
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

On Sat, 2004-05-15 at 06:33, Robin Becker wrote:

> hi indeed. I hope your efforts were independent and not duplicated. The 
> longtable stuff was rolled in prior to 1.19. There should be a class
> 
> class LongTable(Table):
> 	'''Henning von Bargen's changes will be active'''
> 	_longTableOptimize = 1

I saw that. It seems to tweak the behaviour of _calc_height. If one is
able to pass a pre-built rowHeights to the Table class, that change
doesn't really do anything.

> 
> Certainly there's an example at the end of tables.py which has 5000 
> rows. Please try it out and report if it is still hopeless.

I tried it before, but thought I'd double check to be sure. I had to add
LongTable as an export from platypus/__init__.py so I wasn't too sure it
was meant as an externally accessible class, despite it looking like
one.

I'm finding LongTable to be _slower_ than Table if I specify rowHeights
and colWidths for Table. LongTable only permits colWidths.

Here are some example run times. tt.py is my basic test script, ttl uses
the LongTable class instead and doesn't pass a rowHeights parameter. The
first arg is the output PDF, the second is the number of times to repeat
the built-in 10-row dummy array.

[craig@rasputin bench]$ time ./tt.py x 100

real    0m1.926s
user    0m1.832s
sys     0m0.057s
[craig@rasputin bench]$ time ./tt.py x 200

real    0m5.332s
user    0m5.187s
sys     0m0.061s
[craig@rasputin bench]$ time ./tt.py x 400

real    0m18.277s
user    0m17.501s
sys     0m0.106s

[craig@rasputin bench]$ time ./ttl.py x 100

real    0m2.689s
user    0m2.585s
sys     0m0.063s
[craig@rasputin bench]$ time ./ttl.py x 200

real    0m7.382s
user    0m7.210s
sys     0m0.058s
[craig@rasputin bench]$ time ./ttl.py x 400

real    0m24.752s
user    0m24.142s
sys     0m0.102s

So ... LongTable doesn't seem to be the solution to my trouble, though
it's definitely _vastly_ faster than Table without a passed rowHeights
param.

John Pywtorak's suggestion of using many smaller tables sounds like a
practical workaround that'd make it practical to use platypus for my
purposes.

Still, that's what it seems like - a workaround. It strikes me as
somewhat clumsy compared to the really clean nice most of platypus
works, and looks to me like it'd be a right pain to get working with
repeatRows. I can certainly make it work, but if there's any practical
way the performance of the Table class can be improved for larger row
counts I'd be interested in helping out.

I'll have a try at writing a Table subclass that splits off a chunk of
the data and re-uses the parent on each split, if you folks think it'd
be a reasonable approach.

Craig Ringer

--=-u3vTSocCnwAgXkWc/MkU
Content-Disposition: attachment; filename=tt.py
Content-Type: text/x-python; name=tt.py; charset=
Content-Transfer-Encoding: 7bit

#!/usr/bin/env python2.3

import os,sys
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle
from reportlab.lib.units import cm
from reportlab.lib.pagesizes import A4
from reportlab.lib import colors


def main():
    if len(sys.argv) == 3:
        outfile = sys.argv[1]
    else:
        raise Exception,"Usage: %s output_file.pdf rowcount" % sys.argv[0]

# Assemble our dummy table. Would normally be from a DB, etc.
# Note that we're multiplying its size by the second arg to the
# script, making it easy to create huge tables by tweaking a command
# line param.
# Company name            Num ordered     Failure rate
    input_data = [
        ["Acme Rocket Co",      500,            0.9],
        ["Acme Vehicles, Inc",  7,              1.0],
        ["Acme Foods",          5000000,        0.2],
        ["Real Plastic",        1,              1.0],
        ["Evil Media",          12,             0.0],
        ["Fizzy Dice",          8,              0.0],
        ["Acme Aero",           1,              1.0],
        ["Acme Arms & Armour",  30,             0.85],
        ["Acme Aero",           1,              1.0],
        ["Real Plastic",        1,              1.0],
        ]
    input_data = input_data * int(sys.argv[2])

    doc = SimpleDocTemplate(outfile)

    elements = []   # holds page flowables

    # Assemble the list we'll use to hold the assembled table
    table_data = [["Company Name", "Num Ordered", "Fail Rate"]]
    table_data.extend(input_data)

# Construct the Platypus table object that'll render it all,
# feeding it the table_data array.
    t = Table(table_data,
            rowHeights=[12] * len(table_data),
            colWidths=[8*cm, 5*cm, 5*cm]
        )

# Add the table to the list of flowables to draw
    elements.append(t)

# Render the page to the canvas and save it.
    doc.build(elements)

if __name__ == '__main__':
    main()

--=-u3vTSocCnwAgXkWc/MkU
Content-Disposition: attachment; filename=ttl.py
Content-Type: text/x-python; name=ttl.py; charset=
Content-Transfer-Encoding: 7bit

#!/usr/bin/env python2.3

import os,sys
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle
from reportlab.lib.units import cm
from reportlab.lib.pagesizes import A4
from reportlab.lib import colors


def main():
    if len(sys.argv) == 3:
        outfile = sys.argv[1]
    else:
        raise Exception,"Usage: %s output_file.pdf rowcount" % sys.argv[0]

# Assemble our dummy table. Would normally be from a DB, etc.
# Note that we're multiplying its size by the second arg to the
# script, making it easy to create huge tables by tweaking a command
# line param.
# Company name            Num ordered     Failure rate
    input_data = [
        ["Acme Rocket Co",      500,            0.9],
        ["Acme Vehicles, Inc",  7,              1.0],
        ["Acme Foods",          5000000,        0.2],
        ["Real Plastic",        1,              1.0],
        ["Evil Media",          12,             0.0],
        ["Fizzy Dice",          8,              0.0],
        ["Acme Aero",           1,              1.0],
        ["Acme Arms & Armour",  30,             0.85],
        ["Acme Aero",           1,              1.0],
        ["Real Plastic",        1,              1.0],
        ]
    input_data = input_data * int(sys.argv[2])

    doc = SimpleDocTemplate(outfile)

    elements = []   # holds page flowables

    # Assemble the list we'll use to hold the assembled table
    table_data = [["Company Name", "Num Ordered", "Fail Rate"]]
    table_data.extend(input_data)

# Construct the Platypus table object that'll render it all,
# feeding it the table_data array.
    t = Table(table_data,
            rowHeights=[12] * len(table_data),
            colWidths=[8*cm, 5*cm, 5*cm]
        )

# Add the table to the list of flowables to draw
    elements.append(t)

# Render the page to the canvas and save it.
    doc.build(elements)

if __name__ == '__main__':
    main()

--=-u3vTSocCnwAgXkWc/MkU--