[reportlab-users] Hebrew Support Patch

Moshe Wagner moshe.wagner at gmail.com
Mon Jun 8 04:42:43 EDT 2009


Well, I believe I fixed the system dealing with mixed texts.

The problem was that fribidi must have a base direction given for any line.
For instance, the line "hello world!", will stay intact when given to
fribidi if the base direction is LTR, but should become "!hello
world", if the base direction is RTL, since we want to look at it as
part of a RTL paragraph, and therefore the end of the line is on the
left.

Until now, I used fribidi's auto detection for the base direction, but
since I did it for every line separately, it would treat a line like
the one I gave above always as LTR, ignoring the fact that it's in a
RTL paragraph.

The new code deals with that now by determining the base direction by
the alignment type of the paragraph, and giving it to all lines in the
paragraph.

The only problem is with filled paragraphs, where the base direction
can't be determined, since it still could be either RTL or LTR.
The best solution is to allow two types of filled alignments,
FILL_LEFT, and FILL_RIGHT, which is needed anyway for positioning the
last line correctly, as I mentioned before.
Could that be done?


Anyway, until that could be done, I made the code use fribidi's auto
detection for the base direction of filled paragraphs, so it's decided
by there first character.
It isn't ideal, but it works.

So here is the new code:
######################
# Hebrew text patch, Moshe Wagner, June 2009
# <moshe.wagner at gmail.com>

#This code fixes paragraphs with RTL text

# It does it by flipping each line separately.
# (Depending on the type of the line)

# If fribidi cant be imported, it does nothing
# Plain LTR texts will not be affected in any case.
try:
import pyfribidi
except ImportError:
import sys
print >> sys.stderr, "Fribidi module not found; You will not
have RTL support for this paragraph"
else:

#First, the base direction given to pyfribidi must be decided.
# In justified paragraphs, it's decided by their alignment.

# For now, there is only one type of fill justified paragraphs.
# So even though it acts like a left justified one,
# we cannot assume that's the alignment the text should have.
# So the direction is guessed by the first character of the text.

if self.style.alignment == TA_LEFT:
direction = pyfribidi.LTR
elif self.style.alignment == TA_RIGHT:
direction = pyfribidi.RTL
else:
# Get first character of the text:
c = ""
if isinstance(blPara.lines[0], (FragLine, ParaLines)):
c = blPara.lines[0].words[0].text[0] +
blPara.lines[0].words[0].text[1]
elif isinstance(blPara.lines[0], tuple):
c = blPara.lines[0][1][0] + blPara.lines[0][1][1]
#Guess direction by it:
direction = self.guessBaseDirection(c)

for line in blPara.lines:
if isinstance(line, (FragLine, ParaLines)):
#When the line is a FragLine or ParaLines, Its
#text attribute of each of it's words is flipped.
#Then, the order of the words is flipped too,
#So that 2 word parts on the same line
#will be in the right order

for word in line.words:
word.text = pyfribidi.log2vis(word.text,direction)

line.words.reverse()

elif isinstance(line, tuple):
#When the line is just a tuple whose second value is the text.
#since I coulden't directly change it's value,
#it's done by merging the words, flipping them,
#and re-entering them one by one to the second attribute """

s = ' '.join(line[1])
s = pyfribidi.log2vis(s,direction)
line[1][:] = s.split()
else:
print line.__class__.__name__
######################

And this function should be added before the 'wrap' function in the same class:
###############################
# Guesses the direction the given text should have (LTR or RTL), for
cases where it can't be decided by it's alignment
def guessBaseDirection(self, s):
# Since pyfribidi doesn't have an option to return fribidi's guess,
# I have to find out it's guess in a very ugly way

# This adds a neutral sign to the given text.
# Then the text is mirrored, letting fribidi to guess it's direction.
# If it's RTL text, the added sign will now become the first
character of the text,
# While if it's LTR the sign will stay at the end.
import pyfribidi

s += '.'
s = pyfribidi.log2vis(s,pyfribidi.ON)

if (s[0] == "."):
return pyfribidi.RTL
else:
return pyfribidi.LTR
###############################


Moshe

Another point I got wrong is that pyfribidi *DOES* require fribidi
itself, not like I said before.


On Sun, Jun 7, 2009 at 2:12 PM, Moshe Wagner<moshe.wagner at gmail.com> wrote:

> Looking at my test again, I see there is still a bug with mixed texts.

> I'll update when I fix it.

>

> Moshe

>

> On Sun, Jun 7, 2009 at 2:09 PM, Moshe Wagner<moshe.wagner at gmail.com> wrote:

>> Well, I'm not a really great technical writer either, and I'm not

>> really sure how much background you want. But I'll give the best

>> description I can think of. As I said, feel free to change anything so

>> it meets your requirements, or ask me to give more information on any

>> point you think I didn't get into enough.

>>

>> (Note: I don't know enough about any other languages, so I'm strictly

>> speaking about Hebrew. Arabic, for instance, is very similar in terms

>> of being RTL, but has a few very different properties, such as joined

>> letters. I do believe fribidi deals with that correctly, and therefore

>> my patch should add Arabic support too, but I cannot promise that. Is

>> there anyone who can test this?  )

>>

>> Displaying Hebrew -

>>

>> First step for displaying any non ASCII characters, and therefore

>> Hebrew as well, is obtaining a font containing it's characters. I

>> didn't check all of the default PDF fonts, but those I did, did not

>> include Hebrew glyphs.

>> Instead, I use fonts from the Culmus project (http://culmus.sourceforge.net).

>> ( In the test archive I included one font from there, and it's license

>> file. I hope that's ok)

>>

>>

>> 'Visual' and 'Logical' ordering -

>>

>> Once a font with the characters is used, single Hebrew characters can

>> be displayed, but the words will still come out mirrored, as I'll

>> explain.

>> Say we take the word hello, that is, "שלום" ("Shalom"). If you see it

>> correctly, you will see the character "ש" at the most right part of

>> that word, as it's the first letter, and Hebrew is read from right to

>> left.

>> But if we would look at the word as an array of chars, 'c_str', the

>> values would be:

>> c_str[0] = "ש", c_str[1] = "ל", c_str[2] = "ו" and c_str[3]="ם".

>>

>> So when printed from left to right, as is usually done, the word will

>> be shown as:

>> "םולש",

>> since the characters are printed by their real order (called

>> 'logical'), but start from the wrong side.

>>

>> To avoid this, a 'visual' ordering is used instead of the 'logical' one.

>> So the word "שלום" will be stored as -

>> c_str[0] = "ם", c_str[1] = "ו", c_str[2] = "ל" and c_str[3]="ש", so

>> when printed from left to right, it will be displayed as

>> "שלום" - which is the correct order.

>>

>> The visual ordering must be used carefully, though, since when

>> printing text that's split along a few lines, it will cause their

>> order to switch too, as mirroring affects both axises.

>> ( i.e, "שלום לכם", when each word is op a separate line, will be

>> םולש

>> םכל

>> In logical ordering,

>> and:

>> לכם

>> שלום

>>  In visual, which are both wrong.)

>> The solution is to mirror every line on it's own, but that must be

>> done in the wrapping function, but not before or after it.

>>

>>

>> Fribidi and Pyfribidi -

>>

>> A library allowing to convert between 'logical' and 'visual' ordering,

>> while testing if the text is RTL before mirroring it, and supporting

>> mixed texts, where only the RTL part should be mirrored, is fribidi -

>> "An implementation of the Unicode Bidirectional Algorithm (bidi)." -

>> http://fribidi.org/.

>>

>> The python binding for this library is called pyfribidi -

>> http://pyfribidi.sourceforge.net/ . (It does not require fribidi

>> itself installed.)

>> All versions of pyfribidi should work fine, but I suppose the newest

>> version should always be used.

>>

>> My code -

>> My code simply uses pyfribidi to add RTL (and mixed LTR and RTL

>> strings) support to reportlab.

>> In "canvas.py" it simply runs 'pyfribidi.log2vis' on the text, while

>> in "paragraph.py" it does it to each line seperatly.

>>

>> This is I have added to the "canvas.py" file, right at the beginning

>> of the "drawString" function:

>> ######################

>> # Hebrew text patch, Moshe Wagner, June 2009

>> # <moshe.wagner at gmail.com>

>>

>> # Flips the given text with pyfribidi, if it's needed (i.e. Hebrew or Arabic)

>> # If it could not be imported, it does nothing

>> # Plain LTR texts will not be affected in any case.

>> try:

>>        import pyfribidi

>>        text = pyfribidi.log2vis(text,base_direction=pyfribidi.ON)

>>

>> except ImportError:

>>        import sys

>>        print >> sys.stderr, "Fribidi module not found; You will not have RTL

>> support for this paragraph"

>> #####################

>>

>> And this is what I added to "paragraph.py", in the "wrap" function,

>> right after the call to "self.breakLines":

>> ######################

>> # Hebrew text patch, Moshe Wagner, June 2009

>> # <moshe.wagner at gmail.com>

>>

>> #This code fixes paragraphs with RTL text

>>

>> # It does it by flipping each line seperatly.

>> #       (Depending on the type of the line)

>>

>> # If fribidi cant be imported, it does nothing

>> # Plain LTR texts will not be affected in any case.

>>

>> try:

>>        import pyfribidi

>> except ImportError:

>>        import sys

>>        print >> sys.stderr, "Fribidi module not found; You will not have RTL

>> support for this paragraph"

>> else:

>>        for line in blPara.lines:

>>        if isinstance(line, (FragLine, ParaLines)):

>>                #When the line is a FragLine or ParaLines, Its

>>                #text attribute of each of it's words is flipped.

>>                #Then, the order of the words is flipped too,

>>                #So that 2 word parts on the same line

>>                #will be in the right order

>>                for word in line.words:

>>                word.text = pyfribidi.log2vis(word.text, base_direction=pyfribidi.ON)

>>

>>                line.words.reverse()

>>

>>        elif isinstance(line, tuple):

>>                #When the line is just a tuple whose second value is the text.

>>                #since I coulden't directly change it's value,

>>                #it's done by merging the words, flipping them,

>>                #and re-entering them one by one to the second attribute """

>>

>>                s = ' '.join(line[1])

>>                s = pyfribidi.log2vis( s, base_direction=pyfribidi.ON)

>>                line[1][:] = s.split()

>>        else:

>>                print line.__class__.__name__

>> ######################

>>

>>

>>

>> I attached an archive containing a Hebrew font, and a test script.

>> The script should test all cases I know that my patch should deal

>> with, and adds an image of good results for comparison.

>>

>> Moshe

>>

>>

>> On Fri, Jun 5, 2009 at 4:37 PM, Andy Robinson<andy at reportlab.com> wrote:

>>> 2009/6/5 Moshe Wagner <moshe.wagner at gmail.com>:

>>>> Is there any chance this could be added to the official code?

>>>

>>> Moshe, thanks very much for your contribution.  We're happy in

>>> principle to add this kind of patch but it would help a great deal if

>>> you could produce two more things...

>>>

>>> (a) a suitable few paragraphs for us to put in a Whats New page or the

>>> user guide.  Mention what pyfribidi is, what version is needed (if it

>>> matters) and where to get it.  Also mention what one needs to install

>>> to view these things - do we need special fonts, Acrobat Language

>>> packs etc..?   Assume the reader knows nothing about RTL.     Just

>>> send text to me or the list and I'll add it to the docs and/or web

>>> site.

>>>

>>> (b) most important of all, a small test script (see our 'tests'

>>> folder) which generates some Hebrew and/or Arabic output, which we can

>>> run and look at.   The absolute ideal test script would have a bitmap

>>> of the correct Hebrew to look at, and say "the text below should look

>>> like the above", since I at least would not know if it was backwards

>>> or forwards ;-)

>>>

>>> Most people in ReportLab are too busy to have been following this in

>>> detail but we'd really welcome any improvement in this area. We are

>>> also starting from zero knowledge of Hebrew and Arabic - unlike Asian

>>> text which we deal with daily.  There will be a release in a few weeks

>>> and this would be a very valuable addition...

>>>

>>> Best Regards,

>>>

>>> --

>>> Andy Robinson

>>> CEO/Chief Architect

>>> ReportLab Europe Ltd.

>>> Media House, 3 Palmerston Road, Wimbledon, London SW19 1PG, UK

>>> Tel +44-20-8545-1570

>>> _______________________________________________

>>> reportlab-users mailing list

>>> reportlab-users at reportlab.com

>>> http://two.pairlist.net/mailman/listinfo/reportlab-users

>>>

>>

>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Report Lab RTL Test.tar.gz
Type: application/x-gzip
Size: 100254 bytes
Desc: not available
Url : <http://two.pairlist.net/pipermail/reportlab-users/attachments/20090608/0f419508/attachment-0001.bin>


More information about the reportlab-users mailing list