[reportlab-users] BUGFIX: Re:   in paragraph

Dirk Holtwick dirk.holtwick at gmail.com
Fri Dec 5 06:17:45 EST 2008


Hi Robin,

I did not think about speed ;) You are absolutely right that we should
just handle the special case regarding u"\xa0" with our own whitespace
table.

> Are you suggesting that our table is more comprehensive than the unicode
> default argument set? I got it from the C code that implements the
> unicodectype so I hope it is the same for unicode.split.

I don't know, I just trusted your data ;)

Another thing I would suggest is to rename the functions "split" and
"strip" to something like "split_" or "customSplit" to avoid confusion
with the functions from the "string" module.

Cheers
Dirk

Robin Becker schrieb:

> Dirk Holtwick wrote:

> ...........

>>

>> I tested it and it works fine. Another suggestion is not to test for

>> "\x0a" any more to profit from the more elaborated whitespace table

>> for usual cases. Here is my modification:

>>

>> -----------------8<---------------[cut here]

>> def split(text, delim=None):

>> if type(text) is str:

>> text = text.decode('utf8')

>> if type(delim) is str:

>> delim = delim.decode('utf8')

>> elif delim is None:

>> return [uword.encode('utf8') for uword in _wsc_re_split(text)]

>> return [uword.encode('utf8') for uword in text.split(delim)]

>> -----------------8<---------------[cut here]

>>

>> Dirk

> .........

>

> unfortunately that version suffers in speed for the common case when no

> \xa0 is present. Below are my timings for my split and Dirk's (which I

> called _plit so the names are the same length in case that altered the

> timing somehow).

>

> common case no nbsp

>> C:\code\reportlab\platypus>python \python\lib\timeit.py -s "from

>> reportlab.platypus.paragraph import split" "split(u'The

>> difference in default timer function is because on Windows, clock()

>> has microsecond granularity but time()\'s granulari

>> ty is 1/60th of a second; on Unix, clock() has 1/100th of a second

>> granularity and time() is much more precise. On eith

>> er platform, the default timer functions measure wall clock time, not

>> the CPU time. This means that other processes run

>> ning on the same computer may interfere with the timing. The best

>> thing to do when accurate timing is necessary is to r

>> epeat the timing a few times and use the best time. The -r option is

>> good for this; the default of 3 repetitions is pro

>> bably enough in most cases. On Unix, you can use clock() to measure

>> CPU time.')"

>> 10000 loops, best of 3: 173 usec per loop

>>

>> C:\code\reportlab\platypus>python \python\lib\timeit.py -s "from

>> reportlab.platypus.paragraph import _plit" "_plit(u'The

>> difference in default timer function is because on Windows, clock()

>> has microsecond granularity but time()\'s granulari

>> ty is 1/60th of a second; on Unix, clock() has 1/100th of a second

>> granularity and time() is much more precise. On eith

>> er platform, the default timer functions measure wall clock time, not

>> the CPU time. This means that other processes run

>> ning on the same computer may interfere with the timing. The best

>> thing to do when accurate timing is necessary is to r

>> epeat the timing a few times and use the best time. The -r option is

>> good for this; the default of 3 repetitions is pro

>> bably enough in most cases. On Unix, you can use clock() to measure

>> CPU time.')"

>> 1000 loops, best of 3: 233 usec per loop

>>

>

> less common, one nbsp both take about the same time. Dirk's time is

> faster presumably because the one nbsp reduces the number of matches.

>

>> C:\code\reportlab\platypus>python \python\lib\timeit.py -s "from

>> reportlab.platypus.paragraph import split" "split(u'The

>> difference in default timer function is because on Windows, clock()

>> has microsecond granularity but\xa0time()\'s granul

>> arity is 1/60th of a second; on Unix, clock() has 1/100th of a second

>> granularity and time() is much more precise. On e

>> ither platform, the default timer functions measure wall clock time,

>> not the CPU time. This means that other processes

>> running on the same computer may interfere with the timing. The best

>> thing to do when accurate timing is necessary is t

>> o repeat the timing a few times and use the best time. The -r option

>> is good for this; the default of 3 repetitions is

>> probably enough in most cases. On Unix, you can use clock() to

>> measure CPU time.')"

>> 1000 loops, best of 3: 230 usec per loop

>>

>> C:\code\reportlab\platypus>python \python\lib\timeit.py -s "from

>> reportlab.platypus.paragraph import _plit" "_plit(u'The

>> difference in default timer function is because on Windows, clock()

>> has microsecond granularity but\xa0time()\'s granul

>> arity is 1/60th of a second; on Unix, clock() has 1/100th of a second

>> granularity and time() is much more precise. On e

>> ither platform, the default timer functions measure wall clock time,

>> not the CPU time. This means that other processes

>> running on the same computer may interfere with the timing. The best

>> thing to do when accurate timing is necessary is t

>> o repeat the timing a few times and use the best time. The -r option

>> is good for this; the default of 3 repetitions is

>> probably enough in most cases. On Unix, you can use clock() to

>> measure CPU time.')"

>> 1000 loops, best of 3: 230 usec per loop

>

> so I guess we should stick with the test unless there's a compelling

> reason for removing it.

>

> Are you suggesting that our table is more comprehensive than the unicode

> default argument set? I got it from the C code that implements the

> unicodectype so I hope it is the same for unicode.split.



More information about the reportlab-users mailing list