[reportlab-users] formatting long tables

Henning von Bargen reportlab-users@reportlab.com
Thu, 1 May 2003 00:37:11 +0200


Hi,

I tested formatting a table with 5000 rows like this:
    sty = [ ('GRID',(0,0),(-1,-1),1,colors.green),
            ('BOX',(0,0),(-1,-1),2,colors.red),
           ]
    data = [[str(i), Paragraph("xx "* (i%10), styleSheet["BodyText"]),
                          Paragraph("blah "*(i%40), styleSheet["BodyText"])
                ] for i in xrange(5000)]
    t=Table(data, style=sty, colWidths = [50,100,200])

The classical tables.py needed 13 minutes for building the PDF on my
computer
(Pentium 4 2.2GHz, Windows XP).

An idea I mentioned earlier was to calculate only what is really needed
instead of the whole table, which should result in complexity O(nRows)
instead of O(nRows^2) - at least in theory.

Now I gave it a try and hacked tables.py.
The result is: The 5000 rows table PDF can be built in only 73 seconds
on the same computer, giving the same results (a 298 pages PDF file).

Unfortunately, here at home, I don't have diff, only TextPad.

I could send the complete file tables.py to someone (to whom?)
but now it's time for going to bed (in Germany).
Maybe my changes can be included in an "official" reportlab release?

TODO: Test the results with test_platypus_table.py,
I only tested with the test() in tables.py

Have fun,
Henning von Bargen

This is the output of the file comparison inside TextPad
(not including the 5000 rows table test, see above):

Vergleichen von (<)C:\Python22\Lib\reportlab\platypus\tables.py (60214 Byte)
 mit (>)C:\Python22\Lib\reportlab\platypus\hvbtables.py (61006 Byte)

306,307c306,308
<     def _calc_height(self):
<
---
>     def _calc_height(self, availHeight):
>
>         #print "start of calc_heights, H=", self._argH, "availHeight=",
availHeight
321a322,323
>         cntcalc = 0
>         self._Hmax = len(H)
325,326c328,338
<             while None in H:
<                 i = H.index(None)
---
>
>             while None in H:
>                 i = H.index(None)
>                 # we can stop if we have filled up all available room
>                 self._Hmax = i
>                 heightUpToNow = reduce(operator.add, H[:i], 0)
>                 if heightUpToNow > availHeight:
>                     #print "breaking with Hmax=%d" % i
>                     break
>                 #print "calculating row#%d" % i
>                 cntcalc += 1
357,361c369,376
<
<         height = self._height = reduce(operator.add, H, 0)
<         #print "height, H", height, H
<         self._rowpositions = [height]    # index 0 is actually topline; we
skip when processing cells
<         for h in H:
---
>                 self._Hmax = len(H)
>
>             #print "height calculated for %d rows" % cntcalc
>
>         height = self._height = reduce(operator.add, H[:self._Hmax], 0)
>         #print "height, H", height, H
>         self._rowpositions = [height]    # index 0 is actually topline; we
skip when processing cells
>         for h in H[:self._Hmax]:
366,367c381,384
<     def _calc(self, availWidth, availHeight):
<         if hasattr(self,'_width'): return
---
>         #print "end of calc_heights, H=", H
>
>     def _calc(self, availWidth, availHeight):
>         #if hasattr(self,'_width'): return
385c402
<         self._calc_height()
---
>         self._calc_height(availHeight)
704a721
>         #print "wrap", availWidth, availHeight
748,749c766,768
<         lim = len(self._rowHeights)
<         while n<lim:
---
>         #print "in _splitRows, availHeight=%d, _rowHeights=%s" %
(availHeight, self._rowHeights)
>         lim = len(self._rowHeights)
>         while n<self._Hmax:
817d836
<                     self._argH[:repeatRows]+self._argH[n:],
825c843
<             R1 = Table(data[n:], self._colWidths, self._argH[n:],
---
>             R1 = Table(data[n:], self._colWidths,
1423d1441
<
1434c1451
<     SimpleDocTemplate('tables.pdf', showBoundary=1).build(lst)
---
>     SimpleDocTemplate('hvbtables.pdf', showBoundary=1).build(lst)