[reportlab-users] RTL Patch Committed

Hosam Aly haly at centrivision.com
Sun Nov 22 05:02:46 EST 2009


Robin Becker wrote:


> Hosam Aly wrote:

> ..........

>> Meanwhile, I read in the PDF standard (version 1.7 from Adobe) that

>> the PDF text object supports receiving UTF-16BE text, provided that

>> it starts with the Unicode Byte Order Mark (BOM, U+FEFF). I wonder

>> what would be the results if we wrote text in UTF-16 instead of

>> writing the code points in the font? I didn't know how to test this,

>> so I hope someone can help me.

> ..........

> We have used UTF16 in some places in pdfdoc.py. I believe that was

> related to using CJK standard fonts in places where Acrobat Reader

> would normally use pdfdoc encoding ie various comments and document

> description sections.

>

> In principle there's nothing that's better about using a 16bit unicode

> representation for text. I do see that for cmaps there are a lot of

> predefined mappings which correspond to various utf16 subsets.

>

> When the font is a builtin font I can see that using a standard

> encoding makes sense. That is certainly the case for the standard AR

> cjk fonts where the fonts are large and don't have to be embedded.

> However, we are often making up subset fonts for embedding purposes

> and there I don't think it makes sense to use 16bit entries.


Hello Robin. I am thinking about using UTF-16 because, I guess, we would
be writing character points instead of font code points. When we are
using font code points, the reader has no option but to render the font
glyphs that we wrote. I wanted to see, if we used character points
instead, whether the reader would be smart enough to do the shaping itself.


Best regards,

Hosam Aly
Software Engineer
Centrivision
+20 (11) 8000-789


More information about the reportlab-users mailing list