[reportlab-users] Hebrew Support Patch

Moshe Wagner moshe.wagner at gmail.com
Sun Jun 7 07:09:07 EDT 2009


Well, I'm not a really great technical writer either, and I'm not
really sure how much background you want. But I'll give the best
description I can think of. As I said, feel free to change anything so
it meets your requirements, or ask me to give more information on any
point you think I didn't get into enough.

(Note: I don't know enough about any other languages, so I'm strictly
speaking about Hebrew. Arabic, for instance, is very similar in terms
of being RTL, but has a few very different properties, such as joined
letters. I do believe fribidi deals with that correctly, and therefore
my patch should add Arabic support too, but I cannot promise that. Is
there anyone who can test this? )

Displaying Hebrew -

First step for displaying any non ASCII characters, and therefore
Hebrew as well, is obtaining a font containing it's characters. I
didn't check all of the default PDF fonts, but those I did, did not
include Hebrew glyphs.
Instead, I use fonts from the Culmus project (http://culmus.sourceforge.net).
( In the test archive I included one font from there, and it's license
file. I hope that's ok)


'Visual' and 'Logical' ordering -

Once a font with the characters is used, single Hebrew characters can
be displayed, but the words will still come out mirrored, as I'll
explain.
Say we take the word hello, that is, "שלום" ("Shalom"). If you see it
correctly, you will see the character "ש" at the most right part of
that word, as it's the first letter, and Hebrew is read from right to
left.
But if we would look at the word as an array of chars, 'c_str', the
values would be:
c_str[0] = "ש", c_str[1] = "ל", c_str[2] = "ו" and c_str[3]="ם".

So when printed from left to right, as is usually done, the word will
be shown as:
"םולש",
since the characters are printed by their real order (called
'logical'), but start from the wrong side.

To avoid this, a 'visual' ordering is used instead of the 'logical' one.
So the word "שלום" will be stored as -
c_str[0] = "ם", c_str[1] = "ו", c_str[2] = "ל" and c_str[3]="ש", so
when printed from left to right, it will be displayed as
"שלום" - which is the correct order.

The visual ordering must be used carefully, though, since when
printing text that's split along a few lines, it will cause their
order to switch too, as mirroring affects both axises.
( i.e, "שלום לכם", when each word is op a separate line, will be
םולש
םכל
In logical ordering,
and:
לכם
שלום
In visual, which are both wrong.)
The solution is to mirror every line on it's own, but that must be
done in the wrapping function, but not before or after it.


Fribidi and Pyfribidi -

A library allowing to convert between 'logical' and 'visual' ordering,
while testing if the text is RTL before mirroring it, and supporting
mixed texts, where only the RTL part should be mirrored, is fribidi -
"An implementation of the Unicode Bidirectional Algorithm (bidi)." -
http://fribidi.org/.

The python binding for this library is called pyfribidi -
http://pyfribidi.sourceforge.net/ . (It does not require fribidi
itself installed.)
All versions of pyfribidi should work fine, but I suppose the newest
version should always be used.

My code -
My code simply uses pyfribidi to add RTL (and mixed LTR and RTL
strings) support to reportlab.
In "canvas.py" it simply runs 'pyfribidi.log2vis' on the text, while
in "paragraph.py" it does it to each line seperatly.

This is I have added to the "canvas.py" file, right at the beginning
of the "drawString" function:
######################
# Hebrew text patch, Moshe Wagner, June 2009
# <moshe.wagner at gmail.com>

# Flips the given text with pyfribidi, if it's needed (i.e. Hebrew or Arabic)
# If it could not be imported, it does nothing
# Plain LTR texts will not be affected in any case.
try:
import pyfribidi
text = pyfribidi.log2vis(text,base_direction=pyfribidi.ON)

except ImportError:
import sys
print >> sys.stderr, "Fribidi module not found; You will not have RTL
support for this paragraph"
#####################

And this is what I added to "paragraph.py", in the "wrap" function,
right after the call to "self.breakLines":
######################
# Hebrew text patch, Moshe Wagner, June 2009
# <moshe.wagner at gmail.com>

#This code fixes paragraphs with RTL text

# It does it by flipping each line seperatly.
# (Depending on the type of the line)

# If fribidi cant be imported, it does nothing
# Plain LTR texts will not be affected in any case.

try:
import pyfribidi
except ImportError:
import sys
print >> sys.stderr, "Fribidi module not found; You will not have RTL
support for this paragraph"
else:
for line in blPara.lines:
if isinstance(line, (FragLine, ParaLines)):
#When the line is a FragLine or ParaLines, Its
#text attribute of each of it's words is flipped.
#Then, the order of the words is flipped too,
#So that 2 word parts on the same line
#will be in the right order
for word in line.words:
word.text = pyfribidi.log2vis(word.text, base_direction=pyfribidi.ON)

line.words.reverse()

elif isinstance(line, tuple):
#When the line is just a tuple whose second value is the text.
#since I coulden't directly change it's value,
#it's done by merging the words, flipping them,
#and re-entering them one by one to the second attribute """

s = ' '.join(line[1])
s = pyfribidi.log2vis( s, base_direction=pyfribidi.ON)
line[1][:] = s.split()
else:
print line.__class__.__name__
######################



I attached an archive containing a Hebrew font, and a test script.
The script should test all cases I know that my patch should deal
with, and adds an image of good results for comparison.

Moshe


On Fri, Jun 5, 2009 at 4:37 PM, Andy Robinson<andy at reportlab.com> wrote:

> 2009/6/5 Moshe Wagner <moshe.wagner at gmail.com>:

>> Is there any chance this could be added to the official code?

>

> Moshe, thanks very much for your contribution.  We're happy in

> principle to add this kind of patch but it would help a great deal if

> you could produce two more things...

>

> (a) a suitable few paragraphs for us to put in a Whats New page or the

> user guide.  Mention what pyfribidi is, what version is needed (if it

> matters) and where to get it.  Also mention what one needs to install

> to view these things - do we need special fonts, Acrobat Language

> packs etc..?   Assume the reader knows nothing about RTL.     Just

> send text to me or the list and I'll add it to the docs and/or web

> site.

>

> (b) most important of all, a small test script (see our 'tests'

> folder) which generates some Hebrew and/or Arabic output, which we can

> run and look at.   The absolute ideal test script would have a bitmap

> of the correct Hebrew to look at, and say "the text below should look

> like the above", since I at least would not know if it was backwards

> or forwards ;-)

>

> Most people in ReportLab are too busy to have been following this in

> detail but we'd really welcome any improvement in this area. We are

> also starting from zero knowledge of Hebrew and Arabic - unlike Asian

> text which we deal with daily.  There will be a release in a few weeks

> and this would be a very valuable addition...

>

> Best Regards,

>

> --

> Andy Robinson

> CEO/Chief Architect

> ReportLab Europe Ltd.

> Media House, 3 Palmerston Road, Wimbledon, London SW19 1PG, UK

> Tel +44-20-8545-1570

> _______________________________________________

> reportlab-users mailing list

> reportlab-users at reportlab.com

> http://two.pairlist.net/mailman/listinfo/reportlab-users

>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Report Lab RTL Test.tar.gz
Type: application/x-gzip
Size: 58586 bytes
Desc: not available
Url : <http://two.pairlist.net/pipermail/reportlab-users/attachments/20090607/58467d08/attachment-0001.bin>


More information about the reportlab-users mailing list