[reportlab-users] Unicode handling bugs - 2.0
Robin Becker
robin at reportlab.com
Mon Jun 5 10:52:31 EDT 2006
Greg Phillips wrote:
> First of all, thanks much for going Unicode in 2.0, and especially for
> fixing the KeepTogether bug. Both of those simplify my life enormously.
........
> In paragraph.py, there are two places (lines 279 and 298) where tests like:
>
> if type(bulletText) is StringType:
>
> are made to determine whether the bullets are text or lists of
> fragments. This breaks for the obvious reason if the bullet text is
> unicode. I suggest changing these lines to:
>
> if isinstance(bulletText, basestring):
I'm fairly sure we're allowed to have both here now; the decision to go either
utf8/unicode was made fairly late and probably without enough checking.
> There's a similar error at line 1186 of pdfdoc.py. A quick grep shows
> other instances of "is StringType" in the library, but I haven't
> investigated whether these are bugs or not.
I think that one is supposed to be a string.
>
> Also, in paraparser.py, line 710, there's a conversion to cp1252
> encoding to make sgmlop happy; this was causing errors when my input
> included characters that weren't recognized in that encoding. Changing
> the encoding to utf-8 seemed to solve the problem, but I don't know
> enough about what's really going on there to know if that's the Right
> Thing To Do.
.....
Seems right to me, but I don't really know what happens if we get a '<' as part
of a multi-byte character in utf8.
--
Robin Becker
More information about the reportlab-users
mailing list