[reportlab-users] utf-8 characters

David Bourillot reportlab-users@reportlab.com
Tue, 4 May 2004 09:49:45 +0200


Thanks for your answers

Cheers,
David

> -----Original Message-----
> From: reportlab-users-admin@reportlab.com
> [mailto:reportlab-users-admin@reportlab.com]On Behalf Of Marius Gedminas
> Sent: lundi 3 mai 2004 11:34
> To: reportlab-users@reportlab.com
> Subject: Re: [reportlab-users] utf-8 characters
>
>
> On Mon, May 03, 2004 at 08:34:01AM +0100, Chris Withers wrote:
> > David Bourillot wrote:
> > >Exception type: exceptions.UnicodeDecodeError
> > >Exception message: 'utf8' codec can't decode byte 0xc3 in position 9:
> > >unexpected end of data
> > >
> > >After some little investigation, it's seems to me that when
> the string is
> > >split, it's cut between the two bytes of the encoded character 'à'
> >
> > That would seem unlikely, but maybe ask on the python list for
> confirmation.
> >
> > Could it be tha tyou have non-UTF-8 data in your UTF-8 string?
>
> I'm pretty sure the problem is in the line wrapping algorithm used by
> Platypus.
>
> There have been plans to ditch Python 1.5.2 support and switch to
> unicode objects instead of str objects with UTF-8 data everywhere.
> When this is done, this problem will disappear, as there's no way to
> split a unicode string incorrectly [1][2].
>
>   [1] AFAIU Python does not use UTF-16 surrogate pairs, right?  If you
>       want to use characters outside the BMP, you're supposed to compile
>       your Python interpreter with 32-bit Unicode support.
>
>   [2] There are also combining characters that might pose problems with
>       line wrapping.  And I'm not talking about BiDi or other exotic
>       things that Reportlab does not support yet.
>
> Marius Gedminas
> --
> Stupidity management for the superuser is a user space issue in Unix
> systems.
> 		-- Alan Cox
>