[reportlab-users] Re:Larget tables- exponentially increasing run times

Henning von Bargen reportlab-users@reportlab.com
Wed, 19 May 2004 16:28:05 +0200

I used the python profiler to analyze what's causing the exponential run
I ran the profiler for tables with 100, 200, 300 and 400 rows.
To my surprise I found the function CellStyle1.__init__ to be called
in a quadratically increasing fashion.

As Robin pointed out, during the splitting new tables
with a cumulated row length of about N*p are created.
The problem is that pn~N => p~N/n,
thus the algorithm creates tables with about N*N / n = O(Nē)
What causes trouble now is that during Table.__init__,
individual instances of CellStyle are created (one for each cell).
That's why we have the quadratically increasing number of calls to
CellStyle1.__init__ .

Looking at the code, I found that for the Table.__init__ calls from inside
the whole bunch of CellStyles is superflous, because the _cellStyles
will be overwritten a few lines later.

Now I modified the code so that the cellStyles can be supplied to
as an optional argument. The nested loops are only used if we didn't supply
the argument.
The new code in splitRows supplies the argument, which lead to a
dramatically better performance for large tables.

Here are the changes I made (sorry, no diff, I just compared using TextPad
on MS Windows):
Vergleichen von (<)C:\reportlab\hvb-1.19\reportlab\platypus\tables.py.orig
(61803 Byte)
 mit (>)C:\reportlab\hvb-1.19\reportlab\platypus\tables.py (61869 Byte)

<                 repeatRows=0, repeatCols=0, splitByRow=1,
>                 repeatRows=0, repeatCols=0, splitByRow=1,
<         for i in range(nrows):
<             cellcols = []
<             for j in range(ncols):
<                 cellcols.append(CellStyle(`(i,j)`))
<             cellrows.append(cellcols)
<         self._cellStyles = cellrows
>         if cellStyles is None:
>             cellrows = []
>             for i in range(nrows):
>                 cellcols = []
>                 for j in range(ncols):
>                     cellcols.append(CellStyle(`(i,j)`))
>                 cellrows.append(cellcols)
>             self._cellStyles = cellrows
>         else:
>             self._cellStyles = cellStyles
<                 splitByRow=splitByRow)
<         #copy the styles and commands
<         R0._cellStyles = self._cellStyles[:n]
>                 splitByRow=splitByRow, cellStyles=self._cellStyles[:n])
<                     splitByRow=splitByRow)
<             R1._cellStyles =
>                     splitByRow=splitByRow,
<                     splitByRow=splitByRow)
<             R1._cellStyles = self._cellStyles[n:]
>                     splitByRow=splitByRow,

I didn't test all features like mutliBuild etc.,
but I did test using a table with repeatRows and a different style for the
first line,
which worked as expected.

Here is a comparison of the run times on my machine
(Pentium 4 2GHz, 512MB Ram, WinXP Home, Python 2.3, RLTK 1.19).
I called tables.py (my modified version) and the original version
for values of 2000, 4000, and 6000 rows in the LongTable test.
I measured the results roughly by watching the Windows Task Manager

The values are CPU time [seconds] / Memory Consumption[MB]
Rows | original | modified
2000 | 12/21   | 8/21
4000 | 35/34   | 18/35
6000 | 72/45   | 28/45

While the memory conumption seems to be nearly identical,
the run time is more or less linear with the new version,
but quadratic with the original version.

If someone has complex tables (several pages long), please test it.