[reportlab-users] speeding up parse_utf8?
Marius Gedminas
reportlab-users@reportlab.com
Tue, 14 Oct 2003 20:10:13 +0300
--ikeVEW9yuYc//A+q
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Tue, Oct 14, 2003 at 04:50:08PM +0100, Robin Becker wrote:
> Do you know if the python parse_utf8 in pdfmetrics is correct. I looked
> at the source code and see a lot more corners for the built in
> utf8_decode.=20
There's no parse_utf8 in pdfmetrics. Did you mean parse_utf8 in
ttfonts.py? It is mostly correct, in a sense that it accepts valid
UTF-8 correctly. It does not reject all cases of invalid UTF-8 (like
overlong sequences or unassigned codes such as U+FFFE or surrogates).
I would trust Python's builtin UTF-8 codec more.
Marius Gedminas
--=20
You can't spell evil without vi.
--ikeVEW9yuYc//A+q
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
iD8DBQE/jC31kVdEXeem148RAnwRAJ4za9xmfueCfclg2vtaP8a2jRduQQCfdswB
+aHwlBsLiUh3BV14QlOiBj0=
=BiW2
-----END PGP SIGNATURE-----
--ikeVEW9yuYc//A+q--