[reportlab-users] String shapes and encodings

Robin Becker robin at reportlab.com
Wed Jun 8 04:04:05 EDT 2022


..........
> No, the isUnicode check would force text input to be Unicode (a normal Python string). The encoding parameter should be 
> deprecated/removed at some point.
> So instead of String(b'd\xe9j\xe0', encoding='latin-1'), users should pass String(b'd\xe9j\xe0'.decode('latin-1')).
> 
> Claude

For whatever reason we decided to allow either utf8 bytes or unicode as inputs to many of the reportlab functions/methods.

It seems to me that forcing the decode into the caller is wrong.

1) It's not always true that the values passed are explicitly known to be bytes or unicode and
    what a suitable encoding might be.
2) If we have to test to ensure the conversion that code gets scattered everywhere rather than being in the callee.
3) Claude's desire to make the decode explicit at the call is not prevented by current code.

Our default works in a lot of places, but I agree that it won't suit many windows users etc.
It might have been better to have a user controllable default encoding we could then set that into many of the argument 
definitions ie  func(.....,encoding=rl_config.default_byte_encoding,....)

If  the decode fails we could fall back on chardet or similar.

I think removing the ability to use bytes is not an improvement.
-- 
Robin Becker


More information about the reportlab-users mailing list