[reportlab-users] CJK fonts

Andy Robinson andy at reportlab.com
Mon Jan 5 15:01:09 EST 2015


Thanks for all this.  Our font guru is out this week, but we could
certainly expose something a bit friendlier to test if a font contains
a character (or, more usefully, a set of characters).

In principle it would be a really nice feature to allow font
substitution, but the issue is performance.  We have always strived to
make reportlab as fast as we reasonably can when making big documents,
as it's used to do things like generating millions of pages of manuals
per month when editors change content, or making 30-40 page legal
agreements on the fly.   The slowest part in most real-world documents
is paragraph-wrapping, which requires us looking up the width of every
character to size the words.   There is a 'stringWidth' function which
is highly optimised and if it needs to be able to stop, raise an
exception and backtrack to generate invisible font fragments,  it
could slow things down a lot.   Maybe we can find a way to handle it
so the default case still performs fast though.  I can't promise when
we will look at it but clearly we should....

- Andy

On 5 January 2015 at 19:00, Glenn Linderman <v+python at g.nevcal.com> wrote:
> On 1/4/2015 1:01 AM, Andy Robinson wrote:
>
> On 4 January 2015 at 05:18, Glenn Linderman <v+python at g.nevcal.com> wrote:
>
> But this user doesn't always know. Now I could make an assumption about
> certain character ranges being in certain fonts, but is there a way I can
> ask reportlab "Does this character exist in this font?"
>
> Give us some time to remind ourselves how it works - maybe another 3.5
> years ;-)
>
>
> :)  Well, I did learn a lot about fonts in the 3.5 years... enough to find a
> workaround myself, after sending the above... but... not an obvious API!
>
> Remember that we don't draw the text, Adobe Reader (or
> whatever you use instead) does, and it's quite happy being fed IDs of
> glyphs which don't exist and/or they may introduce font substitution
> mechanisms of their own at any time.  So while it's useful to know
> your font might lack a glyph, it doesn't mean it won't get displayed
> somehow.
>
>
> Right now, there are no substitutions. If I create something now, I want it
> viewable now, not in 3.5 years :) Even though I've been waiting 3.5 years to
> create it :)
>
> And since by the time Reader (or Sumatra, which is what I use instead) gets
> the character codes, they've been substituted out of their original
> positions and fonts, and they would hardly know what to substitute with...
>
> There is no convenient high-level API but we ought to add one.  Robin,
> how feasible would that be?
>
>
> The workaround I figured out is as follows. When registering the fonts for
> use, save the result of the TTFont call as, for example, theFont.  Then a
> character can be checked for existence in theFont, by doing
>
> if ord( character ) in theFont.face.charToGlyph:
>     # it exists
>     pass
> else:
>     # doesn't exist, use some other font
>     pass
>
> This can be done to help in generating the <font> directives to pass in to
> reportlab.
>
> In HTML/CSS, font substitutions are done automatically, based on the
> existence of characters in the font. So one can specify a sequence of fonts
> to check to find the characters. This is convenient, and reportlab's
> platypus looks a lot like HTML :)  HTML/CSS apparently also provides a way
> to define that a particular subset of characters should be used from a
> particular font, although I haven't bothered to learn all those details as
> yet, but that could also be useful in some circumstances that are more
> complex than mine.
>
> For the sequence of fonts, CSS allows a comma separated list of fonts, and
> it will search for the character in the fonts using that specified order...
> if still not found, it does its own substitution if it can find the
> character in some other font from its default set of fonts, but if found, it
> uses the first one found, giving the user a fair bit of control.  And more,
> if the user specifies the character ranges, however that is done.
>
> So for generating the HTML version of my data, I just use  "Times New
> Roman", "SimSun" and get the characters I need.  This has filled the 3.5
> year gap, but printing HTML doesn't produce as pretty of results as directly
> generated PDF files, but there were other priorities.
>
> Lacking the priority list of fonts in reportlab, I can use the above
> technique to figure out how to add appropriate <font> wrappers for text
> subsets. The above technique doesn't seem to be a documented API, I had to
> do lots of digging to find it. A documented API would be better, but the
> HTML/CSS features would be even better.
>
> On this front, I note that Just van Rossum's fonttools package is
> being updated; Behdad Esfahdod, who's very well known for his work on
> rendering Persian text and now works for Google, has been bringing it
> up to date:
>
>     https://github.com/behdad/fonttools/
>
> This is not what we use for speed reasons but it's probably the right
> tool if you want to query and understand what's in a font.
>
> Thanks for the tip.
>
> _______________________________________________
> reportlab-users mailing list
> reportlab-users at lists2.reportlab.com
> https://pairlist2.pair.net/mailman/listinfo/reportlab-users
>



-- 
Andy Robinson
Managing Director
ReportLab Europe Ltd.
Thornton House, Thornton Road, Wimbledon, London SW19 4NG, UK
Tel +44-20-8405-6420


More information about the reportlab-users mailing list