[reportlab-users] CJK fonts

Mon Jan 5 14:00:45 EST 2015

On 1/4/2015 1:01 AM, Andy Robinson wrote:
> On 4 January 2015 at 05:18, Glenn Linderman <v+python at g.nevcal.com> wrote:
>> But this user doesn't always know. Now I could make an assumption about
>> certain character ranges being in certain fonts, but is there a way I can
>> ask reportlab "Does this character exist in this font?"
> Give us some time to remind ourselves how it works - maybe another 3.5
> years ;-)

:)  Well, I did learn a lot about fonts in the 3.5 years... enough to 
find a workaround myself, after sending the above... but... not an 
obvious API!

> Remember that we don't draw the text, Adobe Reader (or
> whatever you use instead) does, and it's quite happy being fed IDs of
> glyphs which don't exist and/or they may introduce font substitution
> mechanisms of their own at any time.  So while it's useful to know
> your font might lack a glyph, it doesn't mean it won't get displayed
> somehow.

Right now, there are no substitutions. If I create something now, I want 
it viewable now, not in 3.5 years :) Even though I've been waiting 3.5 
years to create it :)

And since by the time Reader (or Sumatra, which is what I use instead) 
gets the character codes, they've been substituted out of their original 
positions and fonts, and they would hardly know what to substitute with...

> There is no convenient high-level API but we ought to add one.  Robin,
> how feasible would that be?

The workaround I figured out is as follows. When registering the fonts 
for use, save the result of the TTFont call as, for example, theFont.  
Then a character can be checked for existence in theFont, by doing

if ord( character ) in theFont.face.charToGlyph:
     # it exists
     pass
else:
     # doesn't exist, use some other font
     pass

This can be done to help in generating the <font> directives to pass in 
to reportlab.

In HTML/CSS, font substitutions are done automatically, based on the 
existence of characters in the font. So one can specify a sequence of 
fonts to check to find the characters. This is convenient, and 
reportlab's platypus looks a lot like HTML :)  HTML/CSS apparently also 
provides a way to define that a particular subset of characters should 
be used from a particular font, although I haven't bothered to learn all 
those details as yet, but that could also be useful in some 
circumstances that are more complex than mine.

For the sequence of fonts, CSS allows a comma separated list of fonts, 
and it will search for the character in the fonts using that specified 
order... if still not found, it does its own substitution if it can find 
the character in some other font from its default set of fonts, but if 
found, it uses the first one found, giving the user a fair bit of 
control.  And more, if the user specifies the character ranges, however 
that is done.

So for generating the HTML version of my data, I just use  "Times New 
Roman", "SimSun" and get the characters I need.  This has filled the 3.5 
year gap, but printing HTML doesn't produce as pretty of results as 
directly generated PDF files, but there were other priorities.

Lacking the priority list of fonts in reportlab, I can use the above 
technique to figure out how to add appropriate <font> wrappers for text 
subsets. The above technique doesn't seem to be a documented API, I had 
to do lots of digging to find it. A documented API would be better, but 
the HTML/CSS features would be even better.

> On this front, I note that Just van Rossum's fonttools package is
> being updated; Behdad Esfahdod, who's very well known for his work on
> rendering Persian text and now works for Google, has been bringing it
> up to date:
>
>      https://github.com/behdad/fonttools/
>
> This is not what we use for speed reasons but it's probably the right
> tool if you want to query and understand what's in a font.
>
Thanks for the tip.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist2.pair.net/pipermail/reportlab-users/attachments/20150105/f5f8b066/attachment.html>