[reportlab-users] Incorrect character composition

Robin Becker robin at reportlab.com
Fri Apr 17 10:22:34 EDT 2015


Who is responsible for glyph positioning. I believe it is the font + the 
renderer who is responsible.

I wrote the  script below to test various diacritic behaviours in reportlab.

The TLDR is as follows, the TTF fonts seem to know about diacritics. The adobe 
builtins may or may not know about them, but with our standard encoding 
Helvetica clearly doesn't.

The script draws space + glyph + diacritic for some upper and lower case roman 
letters. It also draws the same after unicode normalization.

Where seen, all the diacritics have zero width. The DejaVuSans font seems to do 
slightly better than Arial in centring the common diacritics, where available 
the composed glyphs (obtained by normalization) seem much better.

With no width for centring it would seem we need to examine the curves to get 
any kind of centring right. DejaVu & Arial have some built in negative shifts as 
can be seen by examining the tilde

> C:\tmp>python
> Python 2.7.8 (default, Jun 30 2014, 16:08:48) [MSC v.1500 64 bit (AMD64)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> from reportlab.pdfbase.pdfmetrics import registerFont
>>>> from reportlab.pdfbase.ttfonts import TTFont
>>>> registerFont(TTFont('DejaVuSans','DejaVuSans.ttf'))
>>>> from reportlab.graphics.charts.textlabels import _text2PathDescription
>>>> p=_text2PathDescription(u'\u0303',fontName='DejaVuSans',fontSize=2048)
>>>> p
> [('moveTo', -518, 1370), (u'lineTo', -575, 1425), (u'curveTo', -589, 1438, -602, 1448, -613, 1454),
> (u'curveTo', -624, 1460, -634, 1464, -643, 1464), (u'curveTo', -668, 1464, -687, 1452, -699, 1427),
> (u'curveTo', -711, 1403, -717, 1364, -719, 1309), (u'lineTo', -844, 1309),
> (u'curveTo', -843, 1399, -825, 1468, -791, 1517), (u'curveTo', -757, 1566, -710, 1591, -649, 1591),
> (u'curveTo', -624, 1591, -601, 1587, -579, 1577), (u'curveTo', -558, 1568, -535, 1552, -510, 1530),
> (u'lineTo', -453, 1475), (u'curveTo', -439, 1462, -426, 1452, -414, 1445),
> (u'curveTo', -404, 1439, -394, 1436, -385, 1436), (u'curveTo', -360, 1436, -341, 1448, -329, 1472),
> (u'curveTo', -317, 1496, -311, 1536, -309, 1591), (u'lineTo', -184, 1591),
> (u'curveTo', -185, 1501, -203, 1432, -237, 1382), (u'curveTo', -271, 1334, -318, 1309, -379, 1309),
> (u'curveTo', -404, 1309, -427, 1313, -449, 1323), (u'curveTo', -470, 1332, -493, 1348, -518, 1370),
> 'closePath']
>>>> registerFont(TTFont('Arial','Arial.ttf'))
>>>> pa=_text2PathDescription(u'\u0303',fontName='Arial',fontSize=2048)
>>>> pa
> [('moveTo', -909, 1547), (u'curveTo', -909, 1615, -891, 1670, -853, 1712),
> (u'curveTo', -816, 1754, -767, 1775, -706, 1775), (u'curveTo', -665, 1775, -609, 1757, -537, 1721),
> (u'curveTo', -498, 1701, -467, 1691, -443, 1691), (u'curveTo', -403, 1691, -378, 1720, -370, 1778),
> (u'lineTo', -240, 1778), (u'curveTo', -244, 1626, -309, 1550, -436, 1550),
> (u'curveTo', -478, 1550, -533, 1568, -602, 1606), (u'curveTo', -646, 1630, -679, 1642, -700, 1642),
> (u'curveTo', -752, 1642, -778, 1611, -776, 1547), (u'lineTo', -909, 1547), 'closePath']
>>>>



ie the curve starts at -518/2048 and goes at least to -844/2048, but it's clear 
no single shift can match the various upper and lower case widths that could 
occur. The arial curve is even more negative.

If a combined glyph is in the font we should use it, I'm not sure we even have 
an api for that; TTFont has charToGlyph unicode-->glyph number, but we have code 
to escape if there are no glyph components defined for it so the test is quite hard.

Otherwise, generating a missing combined glyph dynamically is probably the way 
to go, but to do that we need information about how each combining character is 
supposed to be positioned. The alternative is to attempt to do the adjustment 
every time we render text using pdf operators; we still need the same information.

#################################################################
from reportlab.pdfbase.ttfonts import TTFont
from reportlab.pdfbase.pdfmetrics import registerFont
from reportlab.pdfgen.canvas import Canvas
from reportlab.lib.pagesizes import A4 as pagesize
from reportlab.lib.utils import uniChr
from unicodedata import normalize as unormalize
registerFont(TTFont("Arial", "Arial.ttf"))
registerFont(TTFont("DejaVuSans", "DejaVuSans.ttf"))

c = Canvas('tdiacritics.pdf', pagesize=pagesize)
y0 = pagesize[1]-12
for fontName in ('Arial','DejaVuSans','Helvetica'):
	c.setFont(fontName, 10)
	y = y0
	y -= 12
	c.drawString(18,y,fontName)
	for diacritic in range(0x300,0x370):
		if y-24 < 0:
			c.showPage()
			c.setFont(fontName, 10)
			y = y0
			y -= 12
			c.drawString(18,y,fontName)
		y -= 12
		x = 18
		diacritic = uniChr(diacritic)
		c.drawString(x,y,hex(ord(diacritic)))
		x += 40
		u = u' '+diacritic+(u' w=%s'%c.stringWidth(diacritic))
		c.drawString(x,y,u)
		x += max(c.stringWidth(u),40)
		for g in u'AEIOUYaeiouy':
			u = ' '+g+diacritic
			c.drawString(x,y,u)
			x += 20
	c.showPage()
	c.setFont(fontName, 10)
	y = y0
	y -= 12
	c.drawString(18,y,fontName+' normalized')
	for diacritic in range(0x300,0x370):
		if y-24 < 0:
			c.showPage()
			c.setFont(fontName, 10)
			y = y0
			y -= 12
			c.drawString(18,y,fontName+' normalized')
		y -= 12
		x = 18
		diacritic = uniChr(diacritic)
		c.drawString(x,y,hex(ord(diacritic)))
		x += 40
		u = u' '+diacritic+(u' w=%s'%c.stringWidth(diacritic))
		c.drawString(x,y,u)
		x += max(c.stringWidth(u),40)
		for g in u'AEIOUYaeiouy':
			u = unormalize('NFC',' '+g+diacritic)
			c.drawString(x,y,u)
			x += 20
	c.showPage()
c.save()
#################################################################
-- 
Robin Becker


More information about the reportlab-users mailing list