[reportlab-users] utf-8 characters

David Bourillot reportlab-users@reportlab.com
Fri, 30 Apr 2004 11:17:29 +0200


This is a multi-part message in MIME format.

------=_NextPart_000_0044_01C42EA4.B9577F90
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 8bit

Re: [reportlab-users] utf-8 charactersHello,

Thanks for your help. I use the Times New Roman font and it's work fine for
most of the documents.
But I have a problem with one where there is this string: "UNIVERSITà DI
NAPOLI"
The string is encoded with utf-8 and when I generate the PDF, I get this
error:

File "c:/MaKaC/indico/code/code\MaKaC\webinterface\rh\base.py", line 204, in
process
    res = self._process()

  File "c:/MaKaC/indico/code/code\MaKaC\webinterface\rh\abstractModif.py",
line 88, in _process
    data = pdf.getPDFBin()

  File "c:/MaKaC/indico/code/code\MaKaC\PDFinterface\base.py", line 137, in
getPDFBin
    self._doc.build(self._story, onFirstPage=self.firstPage,
onLaterPages=self.laterPages)

  File "C:\Python23\lib\site-packages\reportlab\platypus\doctemplate.py",
line 801, in build
    BaseDocTemplate.build(self,flowables)

  File "C:\Python23\lib\site-packages\reportlab\platypus\doctemplate.py",
line 631, in build
    self.handle_flowable(flowables)

  File "C:\Python23\lib\site-packages\reportlab\platypus\doctemplate.py",
line 549, in handle_flowable
    if self.frame.add(f, self.canv, trySplit=self.allowSplitting):

  File "C:\Python23\lib\site-packages\reportlab\platypus\frames.py", line
120, in _add
    w, h = flowable.wrap(self._getAvailableWidth(), h)

  File "C:\Python23\lib\site-packages\reportlab\platypus\paragraph.py", line
421, in wrap
    self.blPara = self.breakLines([first_line_width, later_widths])

  File "C:\Python23\lib\site-packages\reportlab\platypus\paragraph.py", line
564, in breakLines
    for w in _getFragWords(frags):

  File "C:\Python23\lib\site-packages\reportlab\platypus\paragraph.py", line
199, in _getFragWords
    n = n + stringWidth(w, f.fontName, f.fontSize)

  File "C:\Python23\lib\site-packages\reportlab\pdfbase\pdfmetrics.py", line
632, in _slowStringWidth
    return font.stringWidth(text, fontSize)

  File "C:\Python23\lib\site-packages\reportlab\pdfbase\ttfonts.py", line
987, in stringWidth
    for code in parse_utf8(text):

  File "C:\Python23\lib\site-packages\reportlab\pdfbase\ttfonts.py", line
82, in
    parse_utf8=lambda x, decode=codecs.lookup('utf8')[1]:
map(ord,decode(x)[0])

Exception type: exceptions.UnicodeDecodeError
Exception message: 'utf8' codec can't decode byte 0xc3 in position 9:
unexpected end of data


After some little investigation, it's seems to me that when the string is
split, it's cut between the two bytes of the encoded character 'à'

Am I right? is it a known bug?

Best regards,
David
  -----Original Message-----
  From: reportlab-users-admin@reportlab.com
[mailto:reportlab-users-admin@reportlab.com]On Behalf Of Amit Mongia
  Sent: jeudi 29 avril 2004 11:58
  To: reportlab-users@reportlab.com
  Subject: Re: [reportlab-users] utf-8 characters


  Hi,
  Create a ttf font object and render it using that. Go
  through the example that comes with the user guide for
  rina.ttf.
  You can use the popular windows font Times New Roman
  instead. Or some other ttf font of your choice.
  Happens using font embedding.
  Regards,
  Amit Mongia
  --- David Bourillot <David.Bourillot@cern.ch> wrote:
  > Hello,
  >
  > I use reportlab to generate documents and my problem
  > is some special
  > characters are not displayed correctly.
  > I use string encoded in utf-8.
  > How can I do to get theses characters well
  > displayed?
  >
  > Thanks in advance,
  > David
  >
  > _______________________________________________
  > reportlab-users mailing list
  > reportlab-users@reportlab.com
  >
  http://two.pairlist.net/mailman/listinfo/reportlab-users






  __________________________________
  Do you Yahoo!?
  Win a $20,000 Career Makeover at Yahoo! HotJobs
  http://hotjobs.sweepstakes.yahoo.com/careermakeover
  _______________________________________________
  reportlab-users mailing list
  reportlab-users@reportlab.com
  http://two.pairlist.net/mailman/listinfo/reportlab-users

------=_NextPart_000_0044_01C42EA4.B9577F90
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Re: [reportlab-users] utf-8 characters</TITLE>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2800.1400" name=3DGENERATOR></HEAD>
<BODY>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial=20
size=3D2>Hello,</FONT></SPAN></DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>Thanks =
for your=20
help. I use the Times New Roman font and it's work fine for most of the=20
documents.</FONT></SPAN></DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>But I =
have a problem=20
with one where there is this string: "<FONT face=3D"Times New Roman"=20
size=3D3><EM><U>UNIVERSIT=E0 DI =
NAPOLI</U></EM>"</FONT></FONT></SPAN></DIV>
<DIV><SPAN class=3D819264808-30042004>The string is encoded with utf-8 =
and when I=20
generate the&nbsp;PDF, I get this error:</SPAN></DIV>
<DIV><SPAN class=3D819264808-30042004></SPAN>&nbsp;</DIV><SPAN=20
class=3D819264808-30042004>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>File=20
"c:/MaKaC/indico/code/code\MaKaC\webinterface\rh\base.py", line 204, in=20
process<BR>&nbsp;&nbsp;&nbsp; res =3D =
self._process()</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>&nbsp; =
File=20
"c:/MaKaC/indico/code/code\MaKaC\webinterface\rh\abstractModif.py", line =
88, in=20
_process<BR>&nbsp;&nbsp;&nbsp; data =3D =
pdf.getPDFBin()</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>&nbsp; =
File=20
"c:/MaKaC/indico/code/code\MaKaC\PDFinterface\base.py", line 137, in=20
getPDFBin<BR>&nbsp;&nbsp;&nbsp; self._doc.build(self._story,=20
onFirstPage=3Dself.firstPage, =
onLaterPages=3Dself.laterPages)</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>&nbsp; =
File=20
"C:\Python23\lib\site-packages\reportlab\platypus\doctemplate.py", line =
801, in=20
build<BR>&nbsp;&nbsp;&nbsp;=20
BaseDocTemplate.build(self,flowables)</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>&nbsp; =
File=20
"C:\Python23\lib\site-packages\reportlab\platypus\doctemplate.py", line =
631, in=20
build<BR>&nbsp;&nbsp;&nbsp; =
self.handle_flowable(flowables)</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>&nbsp; =
File=20
"C:\Python23\lib\site-packages\reportlab\platypus\doctemplate.py", line =
549, in=20
handle_flowable<BR>&nbsp;&nbsp;&nbsp; if self.frame.add(f, self.canv,=20
trySplit=3Dself.allowSplitting):</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>&nbsp; =
File=20
"C:\Python23\lib\site-packages\reportlab\platypus\frames.py", line 120, =
in=20
_add<BR>&nbsp;&nbsp;&nbsp; w, h =3D =
flowable.wrap(self._getAvailableWidth(),=20
h)</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>&nbsp; =
File=20
"C:\Python23\lib\site-packages\reportlab\platypus\paragraph.py", line =
421, in=20
wrap<BR>&nbsp;&nbsp;&nbsp; self.blPara =3D =
self.breakLines([first_line_width,=20
later_widths])</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>&nbsp; =
File=20
"C:\Python23\lib\site-packages\reportlab\platypus\paragraph.py", line =
564, in=20
breakLines<BR>&nbsp;&nbsp;&nbsp; for w in=20
_getFragWords(frags):</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>&nbsp; =
File=20
"C:\Python23\lib\site-packages\reportlab\platypus\paragraph.py", line =
199, in=20
_getFragWords<BR>&nbsp;&nbsp;&nbsp; n =3D n + stringWidth(w, f.fontName, =

f.fontSize)</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>&nbsp; =
File=20
"C:\Python23\lib\site-packages\reportlab\pdfbase\pdfmetrics.py", line =
632, in=20
_slowStringWidth<BR>&nbsp;&nbsp;&nbsp; return font.stringWidth(text,=20
fontSize)</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>&nbsp; =
File=20
"C:\Python23\lib\site-packages\reportlab\pdfbase\ttfonts.py", line 987, =
in=20
stringWidth<BR>&nbsp;&nbsp;&nbsp; for code in=20
parse_utf8(text):</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>&nbsp; =
File=20
"C:\Python23\lib\site-packages\reportlab\pdfbase\ttfonts.py", line 82, =
in=20
<BR>&nbsp;&nbsp;&nbsp; parse_utf8=3Dlambda x, =
decode=3Dcodecs.lookup('utf8')[1]:=20
map(ord,decode(x)[0])<BR></FONT></SPAN></DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial =
size=3D2><STRONG>Exception=20
type</STRONG>: exceptions.UnicodeDecodeError <BR><STRONG>Exception=20
message</STRONG>: 'utf8' codec can't decode byte 0xc3 in position 9: =
unexpected=20
end of data <BR></DIV></FONT></SPAN>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>After =
some little=20
investigation, it's seems to me that when the string is split, it's cut =
between=20
the two bytes of the encoded character '<EM><U><FONT face=3D"Times New =
Roman"=20
size=3D3>=E0</FONT></U></EM>'</FONT></SPAN></DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>Am I =
right? is it a=20
known bug?</FONT></SPAN></DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial=20
size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV><FONT face=3DArial><FONT size=3D2><SPAN =
class=3D819264808-30042004>Best=20
regards,</SPAN></FONT></FONT></DIV>
<DIV><FONT face=3DArial><FONT size=3D2><SPAN=20
class=3D819264808-30042004>David</SPAN></FONT></FONT></DIV></SPAN>
<BLOCKQUOTE dir=3Dltr=20
style=3D"PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px =
solid; MARGIN-RIGHT: 0px">
  <DIV class=3DOutlookMessageHeader dir=3Dltr align=3Dleft><FONT =
face=3DTahoma=20
  size=3D2>-----Original Message-----<BR><B>From:</B>=20
  reportlab-users-admin@reportlab.com=20
  [mailto:reportlab-users-admin@reportlab.com]<B>On Behalf Of </B>Amit=20
  Mongia<BR><B>Sent:</B> jeudi 29 avril 2004 11:58<BR><B>To:</B>=20
  reportlab-users@reportlab.com<BR><B>Subject:</B> Re: [reportlab-users] =
utf-8=20
  characters<BR><BR></FONT></DIV><!-- Converted from text/plain format =
-->
  <P><FONT size=3D2>Hi,</FONT> <BR><FONT size=3D2>Create a ttf font =
object and=20
  render it using that. Go</FONT> <BR><FONT size=3D2>through the example =
that=20
  comes with the user guide for</FONT> <BR><FONT =
size=3D2>rina.ttf.</FONT>=20
  <BR><FONT size=3D2>You can use the popular windows font Times New =
Roman</FONT>=20
  <BR><FONT size=3D2>instead. Or some other ttf font of your =
choice.</FONT>=20
  <BR><FONT size=3D2>Happens using font embedding.</FONT> <BR><FONT=20
  size=3D2>Regards,</FONT> <BR><FONT size=3D2>Amit Mongia</FONT> =
<BR><FONT=20
  size=3D2>--- David Bourillot &lt;David.Bourillot@cern.ch&gt; =
wrote:</FONT>=20
  <BR><FONT size=3D2>&gt; Hello,</FONT> <BR><FONT size=3D2>&gt; =
</FONT><BR><FONT=20
  size=3D2>&gt; I use reportlab to generate documents and my =
problem</FONT>=20
  <BR><FONT size=3D2>&gt; is some special</FONT> <BR><FONT size=3D2>&gt; =
characters=20
  are not displayed correctly.</FONT> <BR><FONT size=3D2>&gt; I use =
string encoded=20
  in utf-8.</FONT> <BR><FONT size=3D2>&gt; How can I do to get theses =
characters=20
  well</FONT> <BR><FONT size=3D2>&gt; displayed?</FONT> <BR><FONT =
size=3D2>&gt;=20
  </FONT><BR><FONT size=3D2>&gt; Thanks in advance,</FONT> <BR><FONT =
size=3D2>&gt;=20
  David</FONT> <BR><FONT size=3D2>&gt; </FONT><BR><FONT size=3D2>&gt;=20
  _______________________________________________</FONT> <BR><FONT =
size=3D2>&gt;=20
  reportlab-users mailing list</FONT> <BR><FONT size=3D2>&gt;=20
  reportlab-users@reportlab.com</FONT> <BR><FONT size=3D2>&gt;</FONT> =
<BR><FONT=20
  size=3D2><A=20
  =
href=3D"http://two.pairlist.net/mailman/listinfo/reportlab-users">http://=
two.pairlist.net/mailman/listinfo/reportlab-users</A></FONT>=20
  </P><BR><BR>
  <P>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
  <BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <BR><FONT=20
  size=3D2>__________________________________</FONT> <BR><FONT =
size=3D2>Do you=20
  Yahoo!?</FONT> <BR><FONT size=3D2>Win a $20,000 Career Makeover at =
Yahoo!=20
  HotJobs&nbsp; </FONT><BR><FONT size=3D2><A=20
  =
href=3D"http://hotjobs.sweepstakes.yahoo.com/careermakeover">http://hotjo=
bs.sweepstakes.yahoo.com/careermakeover</A>=20
  </FONT><BR><FONT =
size=3D2>_______________________________________________</FONT>=20
  <BR><FONT size=3D2>reportlab-users mailing list</FONT> <BR><FONT=20
  size=3D2>reportlab-users@reportlab.com</FONT> <BR><FONT size=3D2><A=20
  =
href=3D"http://two.pairlist.net/mailman/listinfo/reportlab-users">http://=
two.pairlist.net/mailman/listinfo/reportlab-users</A></FONT>=20
  </P></BLOCKQUOTE></BODY></HTML>

------=_NextPart_000_0044_01C42EA4.B9577F90--