[reportlab-users] String shapes and encodings
Claude Paroz
claude at 2xlibre.net
Tue Jun 7 12:27:52 EDT 2022
Le 07.06.22 à 10:40, Robin Becker a écrit :
> On 06/06/2022 15:28, Claude Paroz wrote:
>> Hi,
>>
>> In the spirit of Python3 strings being always Unicode, I think that
>> ReportLab String shape should behave the same and accept only Python
>> strings.
>> I admit this might be slightly backwards incompatible, but it could be
>> a first step in string handling simplification in ReportLab. The next
>> step could be a similar patch for platypus Paragraph.
>>
>> Claude
>
> I don't think the fact that python regards a specific encoding of glyphs
> to be strings has much relevance here. Most of the external data is in
> byte form whether encoded as unicode utf8 etc etc.
>
> When python started to provide a unicode encoding of glyphs reportlab
> had to support them because people wanted to use them. Today people
> still want to use bytes.
Of course, at a certain point in time, any digital content is a matter
of bytes. That's not what is discussed here.
The approach Python choose is to push for character conversion happening
in process boundaries, that is at input and output time. When you get
some string input, you have to know (or guess) the encoding and the idea
is to immediately convert to Unicode. Then during the whole string
lifetime in your program, it is Unicode (Python 3 str type). Then, at
some point you have to produce some outpout, and that's the time to
convert back to bytes with the expected encoding from the output
consumer side.
This simplify things *a lot* compared to the Python 2 world when you
never knew if you had to manipulate pure bytes or unicode, and had to
constantly test content in many parts of your code, as you can see in
ReportLab with the many isStr, isBytes, isUnicode, asNative, etc. uses
throughout the code base. I don't despise that, it was a "normal"
consequence of string status on Python 2.
> If python said it was abandoning byte strings then that would be a
> reason to drop all support for them. That would really annoy the gene
> analysts though :)
This won't happen. Bytes, be it strings or any other content type has
legitimate use cases, of course.
> I don't think I would like to apply this patch anytime soon. If others
> have an opinion please speak up.
I totally respect your maintainer choice. It was a (first-step) proposal
in order to simplify string handling and to also improve performances by
less function calls. I'm not angry if you refuse it, we can agree to
disagree :-)
Regards,
Claude
--
www.2xlibre.net
More information about the reportlab-users
mailing list