[Scons-dev] Merge PR #235 before release
Gary Oberbrunner
garyo at oberbrunner.com
Wed May 27 08:42:42 EDT 2015
On Wed, May 27, 2015 at 6:52 AM, anatoly techtonik <techtonik at gmail.com>
wrote:
> What I need is a bulletproof way to convert from anything to unicode. This
> requires some kind of escaping to go forward and back. Some helper
> methods like u2b() (unicode to binary) and b2u(). I am quite surprised that
> so far I found nothing for this "simple" case.
>
That's because in general the encoding of the "binary" string is unknown.
Is it ascii, utf-8, Windows CP-1252, shift-JIS, or something else? You
can't decode such a string to Unicode without knowing the encoding. Check
out the python-3 branch where we've been working through some of those
issues. Your u2b is "easy" if you assume you want the binary to be utf-8
encoded, which is normally safe; this conversion is guaranteed to work.
Your b2u is not so easy. You can't just assume utf-8 as you might think;
if the string has invalid utf-8 bytes it'll raise an error or generate
dummy chars depending on the args you pass to str.decode(). At least it'll
get mangled if it's in a different encoding than you expect.
--
Gary
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist2.pair.net/pipermail/scons-dev/attachments/20150527/0cea5293/attachment.html>
More information about the Scons-dev
mailing list