[reportlab-users] Surprising clash of encoders with pikepdf (with patch)

Lennart Regebro lregebro at shoobx.com
Fri Jan 7 12:16:00 EST 2022


Hi all!

Both Reportlab and PikePDF registers "pdfdoc" encodings, which means that
which encoding you actually end up using is arbitrary. I guess it depends
on the import order, but I haven't checked.

That's all and well in itself, and shouldn't be a problem, but alas,
PikePDF's encoding is using the qpdf library, and that library will not
tell you which character failed to encode. Therefore, it doesn't raise
UnicodeEncodeError which requires that information, but ValueError. This is
actually specified in the docs for encode() and decode():

"encoding errors raise ValueError
<https://docs.python.org/3/library/exceptions.html#ValueError> (or a more
codec specific subclass, such as UnicodeEncodeError
<https://docs.python.org/3/library/exceptions.html#UnicodeEncodeError>)" -
https://docs.python.org/3/library/codecs.html

In other places it says "Raise UnicodeError
<https://docs.python.org/3/library/exceptions.html#UnicodeError> (or a
subclass); this is the default. Implemented in strict_errors()
<https://docs.python.org/3/library/codecs.html#codecs.strict_errors>." -
https://docs.python.org/3/library/codecs.html#error-handlers

I made a PR to change pikepdf's error from ValueError to UnicodeError which
has been iomplemented, but that only fixes half the problem. I believe
Reportlab should make one or both of these minor changes:

1. Catch UnicodeErrorsinstead of UnicodeEncodeError when "pdfdoc" encoding
is used. This should have no drawbacks.

2. When it uses the pdfdoc codec it should use it directly, and not via the
"pdfdoc" name.

The first fix is trivial so I didn't do that, but I attach a patch for the
second fix here.
I hope attachemnts works for this.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist2.pair.net/pipermail/reportlab-users/attachments/20220107/57fb85b4/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pdfdoc.diff
Type: text/x-patch
Size: 3599 bytes
Desc: not available
URL: <https://pairlist2.pair.net/pipermail/reportlab-users/attachments/20220107/57fb85b4/attachment.bin>


More information about the reportlab-users mailing list