[reportlab-users] [Bitbucket] Issue #24: Missing glyphs from embedded emoji font (Symbola) (rptlab/reportlab)

Robin Becker robin at reportlab.com
Tue Feb 25 09:09:39 EST 2014


On 25/02/2014 05:40, Ian Wood wrote:

> --- you can reply above this line ---

>

> Issue 24: Missing glyphs from embedded emoji font (Symbola)

> https://bitbucket.org/rptlab/reportlab/issue/24/missing-glyphs-from-embedded-emoji-font

>

> Ian Wood:

>

> oops! Forgot to log in before I logged this issue..


Ian, again copying users list. Anyhow using values supplied by Mathieu Comandon
I created the test script


############################################################
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4
width, height = A4
from reportlab.lib.colors import red
from reportlab.pdfbase import pdfmetrics
from reportlab.pdfbase.ttfonts import TTFont
# NOTE: provide the location of Symbola.ttf on your setup...
pdfmetrics.registerFont(TTFont("Symbola", "Symbola.ttf"))
c = canvas.Canvas("test-symbola-font.pdf")
to = c.beginText()
to.setTextOrigin(50,height-50)
to.setFont("Symbola", 30)
to.setFillColor(red)
to.textLine(u"Unicode symbols: \u02a4\U0001F631\U0001F64C\U0001F44C")
to.textLine(b"UTF8 symbols:
\xca\xa4\xF0\x9F\x98\xB1\xF0\x9F\x99\x8C\xF0\x9F\x91\x8C")
c.drawText(to)
c.showPage()
c.save()
############################################################

I ran this on windows with python 3.3.3. The rsult shows the dz character
\u02a4, but not the three astral plane emoji characters; they appear as ? chars.

I looked into the PDF produced by reportlab. We appear to be creating the subset
map correctly at the end of the definitions I see this

<7F> <007F>
<80> <02A4>
<81> <1F631>
<82> <1F64C>
<83> <1F44C>

so we've seen those characters and allegedly created glyphs for them. In the
body of the document I see this

(Unicode symbols: \200\201\202\203) Tj T* (UTF8 symbols: \200\201\202\203) Tj T* ET

So we're using the octal escapes for 0x80 0x81 0x82 0x83 in the string. From
this I can only deduce that either we are failing in the glyph creation stage
somewhere (ie when building the subset the glyph lookup fails) or we're building
the subset correctly and Acrobat fails to deliver. I suspect the former.

Debugging in Marius' ttfonts.py code reveals that we don't seem to read all of
the glyphs. At line 641 our unichars lie in range(startCount[n],endCount[n]+1)
and we are reading startCount & endCount with read_ushort() so all our unichars
lie in 0<= unichar <= 0xffff.

Seems there must be some kind of extension to let us read unicodes above 0xffff.
We're using the first 'unicode' cmap table from


> cmap table 0/4: platFormID=0 encodingID=0 offset=00000024

> cmap table 1/4: platFormID=1 encodingID=0 offset=00000144

> cmap table 2/4: platFormID=3 encodingID=1 offset=0000034e

> cmap table 3/4: platFormID=3 encodingID=10 offset=0000046e


and that appears to be a format 4 table which is what we read. Self evidently
I'm missing something.
--
Robin Becker


More information about the reportlab-users mailing list