[reportlab-users] UTF-8 and fonts

Marius Gedminas reportlab-users@reportlab.com
Tue, 4 May 2004 19:49:49 +0300

Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable


On Tue, May 04, 2004 at 05:08:27PM +0200, Ulrich Wisser wrote:
> I followed the instructions and have been able to create a PDF with utf=
> encoded text. So far only with the Rina font. What I don't understand is=
> why the UTF-8 decoding is connected to TTF fonts? Could I use UTF-8 with=
> the standard fonts (Helvetica, Times-Roman)?

It's been a while since I last peeked into Reportlab font internals, but
here's what I remember.

Basically the charset of strings depends on the font used.  You can
register a custom Type1 font with any encoding you can think of, and
then you have to use that specific encoding when writing text.  The
default 14 fonts use Windows-1252 (or MacRoman, depends on your
platform).  TrueType fonts currently only support UTF-8.  In theory
TrueType font classes could be enhanced to support other encodings as
well, but IMHO that would be a step in the wrong direction.

Here's what I would consider a step in the right direction: Unicode
strings everywhere and font classes that convert Unicode characters into
the appropriate encoding if necessary.  It would be trivial to implement
for TrueType font classes, and shouldn't be difficult for Type1 fonts.
One possible complication is that PyRXP does not support Unicode (or can
sometimes be compiled without Unicode support).  There may be other
complications as well (CIDfonts?  I know next to nothing about them.).

> Is there an explanation of this somewhere for download?

"Historical reasons", I suppose.

Marius Gedminas
lg_PC.gigacharset (lg =3D little green men language, PC =3D proxima centaur=
	-- Markus Kuhn provides an example of a locale

Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

Version: GnuPG v1.2.4 (GNU/Linux)