[reportlab-users] Incorrect character composition
Glenn Linderman
v+python at g.nevcal.com
Thu Apr 16 01:46:58 EDT 2015
On 4/15/2015 2:02 AM, Andy Robinson wrote:
> Glenn, my apologies - I had assumed you were discussing "unusual
> languages" without re-reading the original email carefully. It might
> not be that bad.
>
> There are two things we could do in the short term, and I'm keen to
> keep the core library moving forwards:
>
> (1) We could potentially provide a special flowable for kerned titles
> and short phrases. This would of course have to render a glyph at a
> time in Python, doing the lookups and calculations
When writing to fixed resolution devices, various fonts have hints for
use at low resolution, and when rendering the font and character spacing
it varies. I don't know if PDF supports that directly, but I noticed
when printing from a browser to a PDF printer, that the character
spacing was weird in the printed result. When I told the browser to
scale everything up really high, and then the browser's printer driver
to scale to fit the page, the weird character spacing went away. So in
producing PDF for typesetting, it is best to ignore the "hints" for low
resolution devices. Of course screens are some of the lowest resolution
devices, and that is what browsers aim at, mostly. Printing is sort of a
side effect.
My data would be mostly a short word, or up to 3 lines of outdented
text, without right justification.
> (2) If you can find another open source PDF generator in any language
> which gets it right, and let us know, we can study a "hello world" PDF
> out of that tool and see what it does. This would be a big time
> saver.
There are, I think, 4 issues, the first two of which I could definitely
use if implemented, and which sound relatively easy, but likely have
performance impact. They would enable _higher quality typesetting_ of
Latin-based text into PDF files. The others could be hard, but would be
required to support a wider range of languages with non-Latin fonts. I
did read something recently about Micro$oft producing a font layout
system (but they used a different word in the article that I cannot come
up with right now) for all the various needs of different language
systems... The closest thing I can find with Google right now is their
DirectWrite, but whether it incorporates the technology I read about, I
couldn't say, but maybe it does or will. I don't recall if this was
something they were making generally available to make the world's
typography improve, or if it was a proprietary come-on to
promote/improve Windows. It sounded pretty general, language-wise.
1. kerning
2. composite glyph positioning
3. Languages with huge numbers of ligatures, where characters appear
differently, even to the point of requiring different glyphs, at the
beginning or end of words (Arabic) or adjacent to other letters (Thai).
4. RTL languages.
1. kerning
My research into kerning is below, since it was somewhat productive.
Most of it was on this list. I have not had time to research composite
glyph positioning, which
Here's a reference to how to emit kerning into a PDF file:
http://stackoverflow.com/questions/18304954/how-is-kerning-encoded-on-embedded-adobe-type-1-fonts-in-pdf-files
On this mailing list, the following messages are about kerning, and the
last two have sample PDF files that claim to have kerning. Seems like
perhaps integrating Henning's Wordaxe kerning code into reportlab itself
might make it easier to integrate and make it work with floawables.
Anyway, it is a start.
From: Henning von Bargen <H.vonBargen at t-p.com>
Date: Tue, 6 Jan 2015 07:16:15 +0000
> Wordaxe does support automatic hyphenation and kerning.
>
> See the SVN trunk (current revision is 110) at
> http://sourceforge.net/p/deco-cow/code/HEAD/tree/trunk/
>
> However, I failed to make it work with RL's ImageAndFlowables class.
> That's why I did not release an official new version.
>
> For an example with kerning support, see the file
> http://sourceforge.net/p/deco-cow/code/HEAD/tree/trunk/tests/test_truetype.py
>
> I agree with Andy that kerning slows the paragraph-wrapping process down,
> so personally I would only use it for headings and title, not for the
> main text content.
From: Dinu Gherman <gherman at darwin.in-berlin.de>
Date: Tue, 6 Jan 2015 11:37:40 +0100
From: Dinu Gherman <gherman at darwin.in-berlin.de>
Date: Tue, 6 Jan 2015 11:39:30 +0100
From: Dinu Gherman <gherman at darwin.in-berlin.de>
Date: Tue, 6 Jan 2015 11:40:57 +0100
2. Composite glyph positioning
Regarding composite characters made from multiple glyphs, the only
scheme I can now find to adjust Y position is described at the very end
of this link:
https://www.safaribooksonline.com/library/view/developing-with-pdf/9781449327903/ch04.html
That shows the use of Td operator to do both X & Y position between
glyphs, but doesn't show how to calculate X & Y from font metrics. It
would seem that only linear kerning was a concern and was optimized in
operators when the PDF format was designed (since it predates Unicode).
The idea of composing glyphs on the fly probably hadn't crossed any
English-speaking minds, back then. The first couple paragraphs at that
link hint at that likelihood.
Speculation: Maybe there is some mechanism to create composite glyphs
from the individual glyphs for the composite character codes, and embed
that composite glyph in the PDF and use its internal code instead of
positioning them in the stream via the Td operator... but I haven't
found that... only a few things that seemed to hint at it. While Unicode
didn't do that, because of the character code explosion that would
result, any given PDF only needs to deal with the characters (individual
or composite) actually used in any particular document. So there _might_
be a tradeoff between complexity of font embedding versus the complexity
of font display.
Maybe somewhat unrelated to the above issues, but interesting:
I also just found
http://www.linuxfoundation.org/images/8/80/Textextraction_slides_small.pdf
which is rather interesting... a bit short on details of how, but looks
like it would be appropriate and useful when generating PDF files to use
the "ToUnicode" feature, whatever it is... I seem to have found it in
section 5.9.2 of the 1.7 version of the PDF reference, although I
haven't absorbed it yet.
>
> - Andy
> _______________________________________________
> reportlab-users mailing list
> reportlab-users at lists2.reportlab.com
> https://pairlist2.pair.net/mailman/listinfo/reportlab-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist2.pair.net/pipermail/reportlab-users/attachments/20150415/991ff0a7/attachment.html>
More information about the reportlab-users
mailing list