[reportlab-users] Spreadsheet Table

Tomasz Świderski contact at tomaszswiderski.com
Tue Feb 23 08:48:03 EST 2010


You are right. rowHeights copying in __init__ seem unnecessary. I'm not
sure why I did that.

This Longtable stuff seems very strange to me. I understand how it works
but I have no idea why it is used? In old implementation Table instance
calculate rowHeights, spanRanges, nosplitRanges etc. each time Table
split because Table's data change. Longtables stuff prevents unnecessary
rowHeights calculations – it stops calculations when it detects that no
more rows will fit into current page (so there is no sense to calculate
more row heights). It makes sense because calculated rowHeights are not
reused. But why they are not reused in first place? It is easier to
calculate all rowHeights and pass them to splited Tables in _splitRows
method – this way Table have to calculate row heights only once and
Longtable optimization stuff is unnecessary.

I can guess intention of this repeated row height calculation method. If
table have variable width elements like paragraphs it is possible to
shrink table widths if next frame is not so wide like previous. If
column widths shirks, heights must be recalculated. BUT this situation
will never happen because on split Table implementation passes rowWidths
instead of _argW! Just look at this code snippet from Table _splitRows
method :

R1 = self.__class__(data[:repeatRows]+data[n:],colWidths=self._colWidths,
rowHeights=self._argH[:repeatRows]+self._argH[n:],
repeatRows=repeatRows, repeatCols=repeatCols,
splitByRow=splitByRow)

On split Table created new parts passing colWidths and argH. Since
colWidths contains fixed col widths (calculated by _calc_widths),
recalculation of row heights makes no sens to me – event for variable
size elements like paragraphs. Or maybe I just don't understand something.

I removed Longtable stuff in my implementation. I calculate row heights
once and reuse them after split. My implementation can reuse most of
Table internal state (all except rowpositions, colpositions, _spanRects,
_vBlocks and _hBlocks). This should provide some performance boost when
dealing with spans or nosplits.

It is hard to compare performance of implementations since reportlab 2.4
does not contain this patch:
http://two.pairlist.net/pipermail/reportlab-users/2010-February/009275.html
but my spreadsheet implementation does. So I decided to compare 3
versions: spreadsheet, reportlab 2.4 with patch (optimizedlongtable),
and reportlab 2.4 without patch. Results below:

SpreadsheetTable generation time (1000 rows): 2.19117712975.
OptimizedLongTable generation time (1000 rows): 1.98596405983.
LongTable generation time (1000 rows): 2.94729304314.

SpreadsheetTable generation time (2000 rows): 5.23209905624.
OptimizedLongTable generation time (2000 rows): 4.06802105904.
LongTable generation time (2000 rows): 7.60760498047.

SpreadsheetTable generation time (3000 rows): 9.08862996101.
OptimizedLongTable generation time (3000 rows): 6.21168804169.
LongTable generation time (3000 rows): 14.6669168472.

SpreadsheetTable generation time (4000 rows): 13.6766881943.
OptimizedLongTable generation time (4000 rows): 8.18623518944.
LongTable generation time (4000 rows): 23.6615509987.

SpreadsheetTable generation time (5000 rows): 19.132267952.
OptimizedLongTable generation time (5000 rows): 10.3774158955.
LongTable generation time (5000 rows): 35.2170841694.

SpreadsheetTable generation time (6000 rows): 26.460157156.
OptimizedLongTable generation time (6000 rows): 13.2904510498.
LongTable generation time (6000 rows): 55.368956089.

SpreadsheetTable generation time (7000 rows): 38.1424150467.
OptimizedLongTable generation time (7000 rows): 17.2318229675.
LongTable generation time (7000 rows): 76.6997680664.

SpreadsheetTable generation time (8000 rows): 47.3637280464.
OptimizedLongTable generation time (8000 rows): 19.8427381516.
LongTable generation time (8000 rows): 100.836438894.

SpreadsheetTable generation time (9000 rows): 60.7862567902.
OptimizedLongTable generation time (9000 rows): 23.2700841427.
LongTable generation time (9000 rows): 134.000416994.

As you can see reportlab 2.4 with patch is faster. It's because of line
and background commands rewriting stuff in spreadsheet implementation. I
believe it can be improved – I used simplest possible way just to get it
working.

For example snippet from drawbackground method:

visible = []
for row_num in xrange(sr, er + 1):
if not self._is_visible_row(row_num):
continue
visible.append(row_num)

So it calls _is_visible_row many times if background command span on all
data :P It can be easily improved – it's just matter of time and effort.

I'm currently between jobs, so I will probably find time to improve
performance. There is still a lot of room for improvements :) I believe,
I can make some improvements to span commands – current implementation
is very slow. Comparison of spreadsheet implementation with patched
reportlab 2.4 below:

SpreadsheetTable generation time with span (1000 rows): 4.28560996056.
OptimizedTable generation time with span (1000 rows): 9.91152501106.

SpreadsheetTable generation time with span (2000 rows): 14.6250700951.
OptimizedTable generation time with span (2000 rows): 48.9020631313.

SpreadsheetTable generation time with span (3000 rows): 32.16908288.
OptimizedTable generation time with span (3000 rows): 144.50296998.


Best regards,
Tomasz Świderski

P.S. Reportlab 2.4 LongTable breaks span commands on split :(



More information about the reportlab-users mailing list