[reportlab-users] Re: Formatting long tables

Thu, 1 May 2003 17:34:34 +0200

> Well if we don't restrict to fixed size columns we could get the
> following problem even with constant width frames. Establish widths with
> the first page, then we either allow for varying column sizes on
> succeeding pages or we fix the column widths. In the latter case we
> might find that even though a feasible column width assignment exists
> that the quick and dirty method leads to an overflow on later pages. So
> it seems for auto-sizing there's no easy solution without varying column
> widths.
>

I make the following assumption for long tables:

The available width is equal in all frames where table is going to be
rendered.
Otherwise your layout is a mess anyway (but you could still fix all column
sizes).

When auto-calculating column widths:
In a table with M columns, assume that
the width is not specified for m columns.

The existing approach (as I understand it)
is to divide the remaining available width by m,
so that all unspecified column widths are equal
and the table fills the available width exactly.

This is done in the calc function for every wrap().

Auto-sizing column widths should better happen
only once, the first time wrap() is called.
>From then on, all column widths should be considered fixed.

The usual case when auto-sizing is that line-breaking is necessary.
One could give each column a "weight" as well.
One could automagically determine the weight using all the data
(this is quite expensive for long tables),  but I think this should
only be done when the user EXPLICITLY wants it,
my idea is to consider only max. 300 rows in the default case.

An autosizing algorithm (not taking care of "span") could work like this:

cntNonEmptyCells = [0] * nColumns
minWidths = [None] * nColumns
maxWidths = [None] * nColumns
sumWidths = [0] * nColumns # summarized width for each column computed so
far
sumSquares = [0] * nColumns # we need it for calculating standard derivation
of the widths.

# We don't necessary look at all rows when auto-sizing the table widths.
# if rowsToConsider was not given by the user as an argument on table
creation,
# we choose some on our own.
# Remark: considering the whole table is possible by setting rowsToConsider
= range(nRows)
if not rowsToConsider:
    if nRows <= 200:
        rowsToConsider = range(nRows)
    else:
        # if there are more than 200 rows in the table, we choose the first
100 plus 200 random ones.
        rowsToConsider = range(100) + choose200DifferentOutOf (100, nRows)

for r in [rowsToConsider]:
    for c in columns_with_unspecified_width:
        if cellContent is None:
            pass
        else:
            # compute the cellwidth cw and cellheight ch, try to
            # put the entire cell content in one line for "unsizable"
objects like paragraphs.
            # howWideWouldYouLikeToBeIfYouHadAllThePlaceInTheWorld
(cellContent)
            (cw, ch) = hWWYLTBIYHATPITW (cellContent)
            cntNonEmptyCells[c] = cntNonEmptyCells[c]+1
            sumWidths[c] += cw
            sumSquares[c] += cw*cw
            maxWidths[c] = max(maxWidths[c], cw)
            minWidths[c] = min(minWidths[c], cw) or cw
# Now we know how wide each cell would be if it had all the place in the
world,
# for those rows we considered.
# For short tables, this means we considered the whole table.
# For longer tables, we looked at 100 consecutive rows and max. 200 random
rows,
# so we have quite a good impression of the table by computing at most 300
rows.
# We also know what the minimum, maximum and average width of each column's
content is
# and we could compute the standard deviation (if we had the formula at
hand).

# If possible, avoid line-breaking.
sumMaxWidths = sum ([maxWidths[c] for c in columns_with_unspecified_width],
0)
if sumMaxWidths + sum ([specWidth[c] for c in columns_with_specified_width],
0) <= availWidth:
    # Good, as far as we know, we don't need line-breaking
    # We could now either leave the table as narrow as possible, or stretch
it.
    # For now, leave it as narrow as possible.
    # From now on, all columns have fixed size.
    for c in columns_with_unspecified_width:
        specWidth[c] = maxWidths[c]
    columns_with_specified_width = range(nColumns)
    columns_with_unspecified_width = []
else:
    # Columns do not fit without line-breaking
    # The trouble is, that there isn't a definition for what is the optimal
width
    # for a table column, since the table height and width depend on each
other.
    # So how wide should each column be?
    Compute_Columns_Weights_Using_Statistical_Data (nRows, cntNonEmptyCells,
                    minWidths, maxWidths, sumWidths, sumSquares)
    # Maybe the collected statistical data can help us here.
    # For example, when maxWidth[c] is very small compared to the other
columns,
    # then we should set specWidth[c] = maxWidth[c],
    # the same is true if maxWidth[c] is close to avgWidth[c].
    # For the other columns, use avgWidth as weight.