[reportlab-users] utf-8 characters
Tue, 4 May 2004 09:49:45 +0200
Thanks for your answers
> -----Original Message-----
> From: email@example.com
> [mailto:firstname.lastname@example.org]On Behalf Of Marius Gedminas
> Sent: lundi 3 mai 2004 11:34
> To: email@example.com
> Subject: Re: [reportlab-users] utf-8 characters
> On Mon, May 03, 2004 at 08:34:01AM +0100, Chris Withers wrote:
> > David Bourillot wrote:
> > >Exception type: exceptions.UnicodeDecodeError
> > >Exception message: 'utf8' codec can't decode byte 0xc3 in position 9:
> > >unexpected end of data
> > >
> > >After some little investigation, it's seems to me that when
> the string is
> > >split, it's cut between the two bytes of the encoded character 'à'
> > That would seem unlikely, but maybe ask on the python list for
> > Could it be tha tyou have non-UTF-8 data in your UTF-8 string?
> I'm pretty sure the problem is in the line wrapping algorithm used by
> There have been plans to ditch Python 1.5.2 support and switch to
> unicode objects instead of str objects with UTF-8 data everywhere.
> When this is done, this problem will disappear, as there's no way to
> split a unicode string incorrectly .
>  AFAIU Python does not use UTF-16 surrogate pairs, right? If you
> want to use characters outside the BMP, you're supposed to compile
> your Python interpreter with 32-bit Unicode support.
>  There are also combining characters that might pose problems with
> line wrapping. And I'm not talking about BiDi or other exotic
> things that Reportlab does not support yet.
> Marius Gedminas
> Stupidity management for the superuser is a user space issue in Unix
> -- Alan Cox