[reportlab-users] speeding up parse_utf8?

Marius Gedminas reportlab-users@reportlab.com
Tue, 14 Oct 2003 20:10:13 +0300


--ikeVEW9yuYc//A+q
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Oct 14, 2003 at 04:50:08PM +0100, Robin Becker wrote:
> Do you know if the python parse_utf8 in pdfmetrics is correct. I looked
> at the source code and see a lot more corners for the built in
> utf8_decode.=20

There's no parse_utf8 in pdfmetrics.  Did you mean parse_utf8 in
ttfonts.py?  It is mostly correct, in a sense that it accepts valid
UTF-8 correctly.  It does not reject all cases of invalid UTF-8 (like
overlong sequences or unassigned codes such as U+FFFE or surrogates).
I would trust Python's builtin UTF-8 codec more.

Marius Gedminas
--=20
You can't spell evil without vi.

--ikeVEW9yuYc//A+q
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/jC31kVdEXeem148RAnwRAJ4za9xmfueCfclg2vtaP8a2jRduQQCfdswB
+aHwlBsLiUh3BV14QlOiBj0=
=BiW2
-----END PGP SIGNATURE-----

--ikeVEW9yuYc//A+q--