[reportlab-users] Unicode handling bugs - 2.0

Mon Jun 5 09:41:38 EDT 2006

First of all, thanks much for going Unicode in 2.0, and especially  
for fixing the KeepTogether bug. Both of those simplify my life  
enormously.

I've discovered some bugs relating to the Unicode change.

In paragraph.py, there are two places (lines 279 and 298) where tests  
like:

	if type(bulletText) is StringType:

are made to determine whether the bullets are text or lists of  
fragments. This breaks for the obvious reason if the bullet text is  
unicode. I suggest changing these lines to:

	if isinstance(bulletText, basestring):

There's a similar error at line 1186 of pdfdoc.py. A quick grep shows  
other instances of "is StringType" in the library, but I haven't  
investigated whether these are bugs or not.

Also, in paraparser.py, line 710, there's a conversion to cp1252  
encoding to make sgmlop happy; this was causing errors when my input  
included characters that weren't recognized in that encoding.  
Changing the encoding to utf-8 seemed to solve the problem, but I  
don't know enough about what's really going on there to know if  
that's the Right Thing To Do.

Greg