[reportlab-users] utf-8 characters
David Bourillot
reportlab-users@reportlab.com
Tue, 4 May 2004 09:49:45 +0200
Thanks for your answers
Cheers,
David
> -----Original Message-----
> From: reportlab-users-admin@reportlab.com
> [mailto:reportlab-users-admin@reportlab.com]On Behalf Of Marius Gedminas
> Sent: lundi 3 mai 2004 11:34
> To: reportlab-users@reportlab.com
> Subject: Re: [reportlab-users] utf-8 characters
>
>
> On Mon, May 03, 2004 at 08:34:01AM +0100, Chris Withers wrote:
> > David Bourillot wrote:
> > >Exception type: exceptions.UnicodeDecodeError
> > >Exception message: 'utf8' codec can't decode byte 0xc3 in position 9:
> > >unexpected end of data
> > >
> > >After some little investigation, it's seems to me that when
> the string is
> > >split, it's cut between the two bytes of the encoded character 'à'
> >
> > That would seem unlikely, but maybe ask on the python list for
> confirmation.
> >
> > Could it be tha tyou have non-UTF-8 data in your UTF-8 string?
>
> I'm pretty sure the problem is in the line wrapping algorithm used by
> Platypus.
>
> There have been plans to ditch Python 1.5.2 support and switch to
> unicode objects instead of str objects with UTF-8 data everywhere.
> When this is done, this problem will disappear, as there's no way to
> split a unicode string incorrectly [1][2].
>
> [1] AFAIU Python does not use UTF-16 surrogate pairs, right? If you
> want to use characters outside the BMP, you're supposed to compile
> your Python interpreter with 32-bit Unicode support.
>
> [2] There are also combining characters that might pose problems with
> line wrapping. And I'm not talking about BiDi or other exotic
> things that Reportlab does not support yet.
>
> Marius Gedminas
> --
> Stupidity management for the superuser is a user space issue in Unix
> systems.
> -- Alan Cox
>