[reportlab-users] Re: Large tables - exponentially increasing
run times
Robin Becker
reportlab-users@reportlab.com
Mon, 17 May 2004 12:09:43 +0100
Craig Ringer wrote:
......
>
> The main thing that confuses me is why the Table instance being split
> creates an entire new Table instance rather than trimming data (and
> styles, etc) off its self and returning [R0, self]. I presume it comes
> down to the readability and maintainability issues mentioned above.
The answer to this is fairly complex. First off the lead table and its first
split offspring need to coexist. In an ideal world we could modify the original
table in place, but this is python and if we're doing multi-pass platypus then
we would be altering the original data and thus confusing the second pass.
Of course there is only one original table so there is no point in keeping the
original when we come to split the second and later parts of a large table. That
way at least we only double the number of rows held. I'm fairly sure that
optimization hasn't been done so in practice we're holding considerably more.
eg if N=original number of rows, p the rows per page and n the number of pages
we get
Page Tables Display Rows
1: T0 [T1,T2] T1 2N
2: T2 [T3,T4] T3 2(N-p)
3: T4 [T5,T6] T5 2(N-2p)
.....
n: T[2*(n-1)] T[2*(n-1)] 2(N-(n-1)p)
In each row k except the last we have the number of rows is twice the number in
the first column table and for these the number decreases by the rows per page.
I make that 2 (N +p ((n-1)*n)/2)) = 2N + p*(n-1)n which if I make the
approximation pn ~ N gives us N(2+n-1). Which could be much larger than what is
required. Plus we have 2n-1 tables.
If we make the optimization mentioned above then we only double the rows in the
first split.
Page Tables Display Rows
1: T0 [T1,T2] T1 2N
2: T2 [T2,T3] T3 0 counted in row 1
3: T3 [T3,T4] T5
.....
n: T[n] T[n] 0
so here we have 2N rows & n+1 tables. Clearly this optimization is worth making.
--
Robin Becker