[reportlab-users] CJK fonts

Glenn Linderman v+python at g.nevcal.com
Mon Jun 20 15:44:14 EDT 2011


On 6/20/2011 2:32 AM, Robin Becker wrote:

> On 18/06/2011 06:58, Glenn Linderman wrote:

>> So in generating some documents from UTF-8 text files, most of which

>> are in

>> European languages, but one is in Chinese (Simplified script), I

>> discovered that

>> the Times-New-Roman font that I'd been using for the European

>> languages doesn't

>> contain the CJK characters. So the Chinese text I have also has some

>> European

>> characters.

>>

>> Does the ReportLab API provide a way of selecting multiple fonts at

>> the same

>> time, so that characters not in one font will be found in another?

> .........

>

> For the T1 fonts we have such a mechanism via a list of substitution

> fonts. That's used in pdfbase.pdfmetrics.unicode2T1 to fix up any

> encoding issues from the available fonts. That's reasonable for T1

> because the maximum number of glyphs is 256.

>

> In the TTF fonts the assumption is that they cover all of utf8/unicode

> and we make lazy subset fonts so we don't get errors at the right

> time; in fact we only detect that the font lacks a glyph when we are

> building the subset. That means we might end up trying to build a

> subset for a different font in the middle of building subsets. I'm not

> sure how feasible it would be to do that.


So I'm just a brand new reportlab user, struggling with the need to use
Python 2 (my focus is Python 3, I'm a fairly new Python user as well,
and started with 3) to use reportlab at all, and now trying to figure
out why my PDF with Chinese is mostly blank characters instead of
Chinese characters.

So I think I'm using TTF fonts, I have no clue that T1 fonts would even
work for Chinese because of the glyph limit... but in my text editor and
browser, they or Windows do appropriate font substitutions, and the
Chinese, since it is UTF-8, "just works".

I did earlier discover that the use of the "basic postscript font set"
didn't work for all European languages, because it seems to be limited
to the character repertoire of Latin-1 or something, even though I was
feeding in Unicode. So I had to select Times-New-Roman instead of Times
Roman... and that got me started down the path of TTF fonts, I guess.

Being rather ignorant of Windows font APIs (I attempted to research
Windows font APIs some time back, discovered there were at least 4
different font APIs available, couldn't figure out which were the new
ones, or the recommended ones, and never did figure out any of them,
since I didn't know which one to study), I wouldn't know either why it
"just works" in the text editor and browser, and why it couldn't "just
work" in reportlab... even if the embedded subset font would happen to
contain characters from a substituted font, because that is what is
available on the machine that is creating the PDF.

Is there any good reference material for Windows font APIs? I'm not
even sure what Chinese font is in use on my computer to be substituted
in for the characters that are not in Times-New-Roman, nor how to
determine that, as a first step to specifying it for use in reportlab.
Whatever it is, it must come with Windows.

Is there any good reference material for how to solve my problem above
using reportlab? I could probably figure out and code a solution, if I
knew where to start...

Is it possible somehow to tell reportlab what subsets of what fonts
should be included before rendering, as a way to avoid building a subset
of one font while building a subset of a different font? Or is it
simply not possible to create a PDF with need for multiple fonts in
reportlab?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://two.pairlist.net/pipermail/reportlab-users/attachments/20110620/98605bee/attachment.html>


More information about the reportlab-users mailing list