[reportlab-users] Incorrect character composition

Glenn Linderman v+python at g.nevcal.com
Wed Apr 15 04:01:40 EDT 2015


On 4/14/2015 10:25 PM, Marius Gedminas wrote:
> On Tue, Apr 14, 2015 at 12:05:04PM -0700, Glenn Linderman wrote:
>> 6-7 weeks with no response, for a while I thought the list was dead, but now
>> a flurry of messages....
>>
>> I guess I didn't actually ask a question, but is this, like kerning, thought
>> to be too slow to implement, or is it just that the market for reportlab
>> simply doesn't include languages that don't have precomposed glyphs, or
>> something else?
> When I and Vika originally implemented Unicode + TrueType support in
> ReportLab, we didn't implement support for combining characters.  I
> don't remember if the TTF/PDF specifications available at that time included
> such support.
>
> I guess nobody stepped up to add the missing support since then.
>
> Technical details (that might be wrong if the code changed since 2003,
> which it probably did--I wasn't keeping track): ReportLab takes apart
> the TTF and builds multiple fonts, each containing a subset of the
> original glyphs (up to 256).  These subsets discard any and all TTF
> tables not explicitly copied, which I guess include the tables used for
> rendering combining characters in a nice way.

Thanks for the response and explanation of the history... By the time I 
started using reportlab, TTF support already existed. Maybe no one else 
knew that combining character support didn't exist.

In looking at the details of PDF files, it seems that when kerning _is_ 
supported, it is not done by copying TTF kerning tables into Postscript 
Fonts, but rather by explicitly coding the distance to advance the caret 
when laying down text.

While I was waiting for a response, I was speculating about how 
combining characters might be coded in PDF files, but I haven't found 
any that contain combining characters that I have been able to take 
apart and examine (and I'm no expert at such examination).

I speculated that if kerning tables are not copied and used by the 
display engines, that probably combining character X & Y positions would 
not be either. The conclusion of that speculation was that it would be 
the responsibility of the PDF file creation tool, which has all the 
available font tables, to use the kerning feature to adjust the X 
position, and possibly just absolutely position the Y position.... I 
didn't find (but may have overlooked) any technique in the PDF 
specification for adjusting the Y position of the combining character 
short of simply making it its own character stream, with an adjusted Y 
baseline.

>> On 2/21/2015 1:18 PM, Glenn Linderman wrote:
>>> I've suddenly discovered a need to use Unicode characters that do not fall
>>> into the category of "precomposed glyphs", instead being forced to use
>>> "combining characters" for certain diacritical marks.
>>>
>>> However, the combined result from reportlab looks rather stupid compared
>>> to the results seen in other programs (browsers, text editors, word
>>> processors, etc.).  I even displayed the results in two different PDF
>>> viewers, Sumatra and Adobe, before concluding it must be a reportlab
>>> thing.
>>>
>>> The problem is with the characters called  open o  (upper and lower case),
>>> and  open e (at least lower case, the upper case version looks better, but
>>> that may be more due to the open E being narrower than due to proper
>>> handling) when combined with the combining tilde, and other similar
>>> diacriticals.
>>>
>>> In my sample at http://nevcal.com/temporary/openo.pdf I've also included a
>>> precomposed ã and Õ as well, for comparison of where the tilde should be
>>> placed.  Here are the same characters in email... I note that in my email
>>> client (Thunderbind) the precomposed tildes are slightly closer to the
>>> characters than the combining tilde, but in the reportlab-generated PDF,
>>> the lower case combining tildes are far too high, and those over (wider)
>>> upper case characters are not centered.  Times New Roman font in both this
>>> email (unless your client or the mailing list strips the fonts) and the
>>> PDF.
>>>
>>> Glenn
>>>
>>> ɔãɔ̃ÕƆ̃ɛɛ̃Ɛ̃

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist2.pair.net/pipermail/reportlab-users/attachments/20150415/18d5fb6f/attachment.html>


More information about the reportlab-users mailing list