[reportlab-users] Filenames on Windows

Wietse Jacobs wietse.j at gmail.com
Fri Jul 31 07:32:01 EDT 2009


Hello,

We've run into a problem when saving a pdf file with reportlab if the
filename contains non-ascii characters. This is on Windows.

The problem is with how Python handles the "open(filename)" function
on Windows in combination with the fact that reportlab will always
encode a unicode string as utf-8:

In pdfbase.PDFDocument.SaveToFile the filename that is used to save
the file is run through utf8str which will encode any unicode string
as utf-8 and for anything else that's passed in it will return
str(input). This is obviously "a good thing", except for filenames on
Windows. The problem is that on Windows, stock CPython's
implementation of 'open()' will use the unicode API of Windows if it's
passed in a unicode filename, but the legacy (non-unicode) API if it
receives a bytestring.

If I run the following script on a Mac, everything looks ok, but on
Windows the utf8 names are garbled:

-----8<-------------------------------------

def try_char(char):
name = u'utf8-char_' + char + u'.txt'
f = open(name.encode('utf8'), 'wb')
f.write('using character "')
f.write(char.encode('utf8'))
f.write('"\n')
f.close()

name = u'unicode-char_' + char + u'.txt'
f = open(name, 'wb')
f.write('using character "')
f.write(char.encode('utf8'))
f.write('"\n')
f.close()

for c in ['A', u'\u00E8', u'\u03A0', u'\u03A3', u'\u03A9']:
try_char(c)

-----8<-------------------------------------

If I eliminate the call to utf8str in SaveToFile all works well. Are
there reasons to keep this call that I don't see?

--
--Wietse


More information about the reportlab-users mailing list