[reportlab-users] Possible bug in ReportLab 3.5.68

Robin Becker robin at reportlab.com
Wed Jul 21 08:30:04 EDT 2021


Hi David,

this got stuck in our support list (for license holders only), but it came through to me as being of interest. Normally 
opensource users should send bug reports to the users list reportlab-users at lists2.reportlab.com. You can sign up at

https://pairlist2.pair.net/mailman/listinfo/reportlab-users

I took a look at your issue and find that there is an encoding difference between the two PDF's.

In works.pdf
<< /CALS_LayerMetadata << >> /Intent [ /View /Design ] /Name (Sonderfarbe "Führungslinien") /Type /OCG /Usage << 
/CreatorInfo << /Creator (callas pdfToolbox) /Subtype /Artwork >> >> >>


in fail.pdf
<< /CALS_LayerMetadata << >> /Intent [ /View /Design ] /Name (Sonderfarbe "F\200hrungslinien") /Type /OCG /Usage << 
/CreatorInfo << /Creator (callas pdfToolbox) /Subtype /Artwork >> >> >>

the problem is the string (Sonderfarbe "F\200hrungslinien") in fail; it cannot be encoded using the pdfdoc encoding. The 
equivalent string in works is fine as u umlaut appears to be encodable.

I'm not sure exactly where this string comes from or what it is used for in the PDF, but it's the problem here.

It may well be that there is a mechanism for using the \x200 escape, but obviously that's not working here.

> (david-vogt) robin at minikat:~/devel/david-vogt
> $ python
> Python 3.10.0b4 (default, Jul 13 2021, 14:40:45) [GCC 11.1.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
>>>> from reportlab.pdfbase.pdfdoc import PDFDocument
>>>> pdfdoc=PDFDocument()
>>>> sworks = 'Sonderfarbe "Führungslinien"'
>>>> sfail ='Sonderfarbe "F\x80hrungslinien"'
>>>> sworks.encode('pdfdoc')
> b'Sonderfarbe "F\xfchrungslinien"'
>>>> sfail.encode('pdfdoc')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/home/robin/devel/david-vogt/lib/python3.10/site-packages/reportlab/pdfbase/rl_codecs.py", line 1000, in encode
>     return charmap_encode(input,errors,encoding_map)
> UnicodeEncodeError: 'charmap' codec can't encode character '\x80' in position 14: character maps to <undefined>
>>>> 

--
Robin Becker


On 21/07/2021 09:53, David Vogt wrote:
> Hi,
> 
> We're maintaining an application that uses reportlab together with pdfrw
> to annotate PDF documents. The application has worked for two years with
> no problem.
> 
> I've now come across some PDFs that cause ReportLab to crash,
> potentially due to misinterpretation of a character encoding. The
> documents were created using InDesign 16.3.
> 
> I'm attaching a small test script and two PDF files to reproduce the
> issue. Also attached is a requirements.txt listing the exact versions
> we're using. Python version is 3.8.2.
> 
> 
> Run python3 test.py works.pdf to run with a functioning pdf
> Run python3 test.py fail.pdf to reproduce the problem
> 
> 
> Thank you so much for ReportLab! And thanks in advance for looking into
> this.
> 
> 
> Best regards,
> David Vogt
> 
> 
> 
> _______________________________________________
> reportlab-support mailing list
> reportlab-support at lists.reportlab.com
> http://lists.reportlab.com/cgi-bin/mailman/listinfo/reportlab-support
> **NB** attachments under 750kB please
> 


-- 
Robin Becker


More information about the reportlab-users mailing list