[reportlab-users] Hebrew Support Patch

Moshe Wagner moshe.wagner at gmail.com
Mon Jun 8 05:00:27 EDT 2009


Sorry, seems like a posted the code with a stupid bug. This is how it
should be of course:
(The other function is still fine)
######################
# Hebrew text patch, Moshe Wagner, June 2009
# <moshe.wagner at gmail.com>

#This code fixes paragraphs with RTL text

# It does it by flipping each line separately.
# (Depending on the type of the line)

# If fribidi cant be imported, it does nothing
# Plain LTR texts will not be affected in any case.
try:
import pyfribidi
except ImportError:
import sys
print >> sys.stderr, "Fribidi module not found; You will not
have RTL support for this paragraph"
else:

#First, the base direction given to pyfribidi must be decided.
# In justified paragraphs, it's decided by their alignment.

# For now, there is only one type of fill justified paragraphs.
# So even though it acts like a left justified one,
# we cannot assume that's the alignment the text should have.
# So the direction is guessed by the first character of the text.

if self.style.alignment == TA_LEFT:
direction = pyfribidi.LTR
elif self.style.alignment == TA_RIGHT:
direction = pyfribidi.RTL
else:
# Get first character of the text:
c = ""
if isinstance(blPara.lines[0], (FragLine, ParaLines)):
if len(blPara.lines[0].words[0].text) < 2:
#This must be English, because Unicode chars take up 2
spaces in the array
direction = pyfribidi.LTR
else:
c = blPara.lines[0].words[0].text[0] +
blPara.lines[0].words[0].text[1]
elif isinstance(blPara.lines[0], tuple):
if len(blPara.lines[0][1]) < 2:
#This must be English, because Unicode chars take up 2
spaces in the array
direction = pyfribidi.LTR
else:
c = blPara.lines[0][1][0] + blPara.lines[0][1][1]
#Guess direction by it:
direction = self.guessBaseDirection(c)

for line in blPara.lines:
if isinstance(line, (FragLine, ParaLines)):
#When the line is a FragLine or ParaLines, Its
#text attribute of each of it's words is flipped.
#Then, the order of the words is flipped too,
#So that 2 word parts on the same line
#will be in the right order

for word in line.words:
word.text = pyfribidi.log2vis(word.text,direction)

line.words.reverse()

elif isinstance(line, tuple):
#When the line is just a tuple whose second value is the text.
#since I coulden't directly change it's value,
#it's done by merging the words, flipping them,
#and re-entering them one by one to the second attribute """

s = ' '.join(line[1])
s = pyfribidi.log2vis(s,direction)
line[1][:] = s.split()
else:
print line.__class__.__name__
######################


Also, I forgot to point out the archive I attached with my last email
includes the new test script for the patch.

Sorry again,
Moshe

On Mon, Jun 8, 2009 at 11:42 AM, Moshe Wagner<moshe.wagner at gmail.com> wrote:

> Well, I believe I fixed the system dealing with mixed texts.

>

> The problem was that fribidi must have a base direction given for any line.

> For instance, the line "hello world!", will stay intact when given to

> fribidi if the base direction is LTR, but should become "!hello

> world", if the base direction is RTL, since we want to look at it as

> part of a RTL paragraph, and therefore the end of the line is on the

> left.

>

> Until now, I used fribidi's auto detection for the base direction, but

> since I did it for every line separately, it would treat a line like

> the one I gave above always as LTR, ignoring the fact that it's in a

> RTL paragraph.

>

> The new code deals with that now by determining the base direction by

> the alignment type of the paragraph, and giving it to all lines in the

> paragraph.

>

> The only problem is with filled paragraphs, where the base direction

> can't be determined, since it still could be either RTL or LTR.

> The best solution is to allow two types of filled alignments,

> FILL_LEFT, and FILL_RIGHT, which is needed anyway for positioning the

> last line correctly, as I mentioned before.

> Could that be done?

>

>

> Anyway, until that could be done, I made the code use fribidi's auto

> detection for the base direction of filled paragraphs, so it's decided

> by there first character.

> It isn't ideal, but it works.

>

> So here is the new code:

> ######################

> # Hebrew text patch, Moshe Wagner, June 2009

> # <moshe.wagner at gmail.com>

>

> #This code fixes paragraphs with RTL text

>

> # It does it by flipping each line separately.

> #       (Depending on the type of the line)

>

> # If fribidi cant be imported, it does nothing

> # Plain LTR texts will not be affected in any case.

> try:

>    import pyfribidi

> except ImportError:

>        import sys

>        print >> sys.stderr, "Fribidi module not found; You will not

> have RTL support for this paragraph"

> else:

>

>    #First, the base direction given to pyfribidi must be decided.

>    # In justified paragraphs, it's decided by their alignment.

>

>    # For now, there is only one type of fill justified paragraphs.

>    # So even though it acts like a left justified one,

>    # we cannot assume that's the alignment the text should have.

>    # So the direction is guessed by the first character of the text.

>

>    if self.style.alignment == TA_LEFT:

>        direction = pyfribidi.LTR

>    elif self.style.alignment == TA_RIGHT:

>        direction = pyfribidi.RTL

>    else:

>        # Get first character of the text:

>        c = ""

>        if isinstance(blPara.lines[0], (FragLine, ParaLines)):

>            c = blPara.lines[0].words[0].text[0] +

> blPara.lines[0].words[0].text[1]

>        elif isinstance(blPara.lines[0], tuple):

>            c = blPara.lines[0][1][0] + blPara.lines[0][1][1]

>        #Guess direction by it:

>        direction = self.guessBaseDirection(c)

>

>    for line in blPara.lines:

>        if isinstance(line, (FragLine, ParaLines)):

>            #When the line is a FragLine or ParaLines, Its

>            #text attribute of each of it's words is flipped.

>            #Then, the order of the words is flipped too,

>            #So that 2 word parts on the same line

>            #will be in the right order

>

>            for word in line.words:

>                word.text = pyfribidi.log2vis(word.text,direction)

>

>            line.words.reverse()

>

>        elif isinstance(line, tuple):

>            #When the line is just a tuple whose second value is the text.

>            #since I coulden't directly change it's value,

>            #it's done by merging the words, flipping them,

>            #and re-entering them one by one to the second attribute """

>

>            s = ' '.join(line[1])

>            s = pyfribidi.log2vis(s,direction)

>            line[1][:] = s.split()

>        else:

>            print line.__class__.__name__

> ######################

>

> And this function should be added before the 'wrap' function in the same class:

> ###############################

> # Guesses the direction the given text should have (LTR or RTL), for

> cases where it can't be decided by it's alignment

> def guessBaseDirection(self, s):

>    #  Since pyfribidi doesn't have an option to return fribidi's guess,

>    #  I have to find out it's guess in a very ugly way

>

>    # This adds a neutral sign to the given text.

>    # Then the text is mirrored, letting fribidi to guess it's direction.

>    # If it's RTL text, the added sign will now become the first

> character of the text,

>    #  While if it's LTR the sign will stay at the end.

>    import pyfribidi

>

>    s += '.'

>    s = pyfribidi.log2vis(s,pyfribidi.ON)

>

>    if (s[0] == "."):

>        return pyfribidi.RTL

>    else:

>        return pyfribidi.LTR

> ###############################

>

>

> Moshe

>

> Another point I got wrong is that pyfribidi *DOES* require fribidi

> itself, not like I said before.

>

>

> On Sun, Jun 7, 2009 at 2:12 PM, Moshe Wagner<moshe.wagner at gmail.com> wrote:

>> Looking at my test again, I see there is still a bug with mixed texts.

>> I'll update when I fix it.

>>

>> Moshe

>>

>> On Sun, Jun 7, 2009 at 2:09 PM, Moshe Wagner<moshe.wagner at gmail.com> wrote:

>>> Well, I'm not a really great technical writer either, and I'm not

>>> really sure how much background you want. But I'll give the best

>>> description I can think of. As I said, feel free to change anything so

>>> it meets your requirements, or ask me to give more information on any

>>> point you think I didn't get into enough.

>>>

>>> (Note: I don't know enough about any other languages, so I'm strictly

>>> speaking about Hebrew. Arabic, for instance, is very similar in terms

>>> of being RTL, but has a few very different properties, such as joined

>>> letters. I do believe fribidi deals with that correctly, and therefore

>>> my patch should add Arabic support too, but I cannot promise that. Is

>>> there anyone who can test this?  )

>>>

>>> Displaying Hebrew -

>>>

>>> First step for displaying any non ASCII characters, and therefore

>>> Hebrew as well, is obtaining a font containing it's characters. I

>>> didn't check all of the default PDF fonts, but those I did, did not

>>> include Hebrew glyphs.

>>> Instead, I use fonts from the Culmus project (http://culmus.sourceforge.net).

>>> ( In the test archive I included one font from there, and it's license

>>> file. I hope that's ok)

>>>

>>>

>>> 'Visual' and 'Logical' ordering -

>>>

>>> Once a font with the characters is used, single Hebrew characters can

>>> be displayed, but the words will still come out mirrored, as I'll

>>> explain.

>>> Say we take the word hello, that is, "שלום" ("Shalom"). If you see it

>>> correctly, you will see the character "ש" at the most right part of

>>> that word, as it's the first letter, and Hebrew is read from right to

>>> left.

>>> But if we would look at the word as an array of chars, 'c_str', the

>>> values would be:

>>> c_str[0] = "ש", c_str[1] = "ל", c_str[2] = "ו" and c_str[3]="ם".

>>>

>>> So when printed from left to right, as is usually done, the word will

>>> be shown as:

>>> "םולש",

>>> since the characters are printed by their real order (called

>>> 'logical'), but start from the wrong side.

>>>

>>> To avoid this, a 'visual' ordering is used instead of the 'logical' one.

>>> So the word "שלום" will be stored as -

>>> c_str[0] = "ם", c_str[1] = "ו", c_str[2] = "ל" and c_str[3]="ש", so

>>> when printed from left to right, it will be displayed as

>>> "שלום" - which is the correct order.

>>>

>>> The visual ordering must be used carefully, though, since when

>>> printing text that's split along a few lines, it will cause their

>>> order to switch too, as mirroring affects both axises.

>>> ( i.e, "שלום לכם", when each word is op a separate line, will be

>>> םולש

>>> םכל

>>> In logical ordering,

>>> and:

>>> לכם

>>> שלום

>>>  In visual, which are both wrong.)

>>> The solution is to mirror every line on it's own, but that must be

>>> done in the wrapping function, but not before or after it.

>>>

>>>

>>> Fribidi and Pyfribidi -

>>>

>>> A library allowing to convert between 'logical' and 'visual' ordering,

>>> while testing if the text is RTL before mirroring it, and supporting

>>> mixed texts, where only the RTL part should be mirrored, is fribidi -

>>> "An implementation of the Unicode Bidirectional Algorithm (bidi)." -

>>> http://fribidi.org/.

>>>

>>> The python binding for this library is called pyfribidi -

>>> http://pyfribidi.sourceforge.net/ . (It does not require fribidi

>>> itself installed.)

>>> All versions of pyfribidi should work fine, but I suppose the newest

>>> version should always be used.

>>>

>>> My code -

>>> My code simply uses pyfribidi to add RTL (and mixed LTR and RTL

>>> strings) support to reportlab.

>>> In "canvas.py" it simply runs 'pyfribidi.log2vis' on the text, while

>>> in "paragraph.py" it does it to each line seperatly.

>>>

>>> This is I have added to the "canvas.py" file, right at the beginning

>>> of the "drawString" function:

>>> ######################

>>> # Hebrew text patch, Moshe Wagner, June 2009

>>> # <moshe.wagner at gmail.com>

>>>

>>> # Flips the given text with pyfribidi, if it's needed (i.e. Hebrew or Arabic)

>>> # If it could not be imported, it does nothing

>>> # Plain LTR texts will not be affected in any case.

>>> try:

>>>        import pyfribidi

>>>        text = pyfribidi.log2vis(text,base_direction=pyfribidi.ON)

>>>

>>> except ImportError:

>>>        import sys

>>>        print >> sys.stderr, "Fribidi module not found; You will not have RTL

>>> support for this paragraph"

>>> #####################

>>>

>>> And this is what I added to "paragraph.py", in the "wrap" function,

>>> right after the call to "self.breakLines":

>>> ######################

>>> # Hebrew text patch, Moshe Wagner, June 2009

>>> # <moshe.wagner at gmail.com>

>>>

>>> #This code fixes paragraphs with RTL text

>>>

>>> # It does it by flipping each line seperatly.

>>> #       (Depending on the type of the line)

>>>

>>> # If fribidi cant be imported, it does nothing

>>> # Plain LTR texts will not be affected in any case.

>>>

>>> try:

>>>        import pyfribidi

>>> except ImportError:

>>>        import sys

>>>        print >> sys.stderr, "Fribidi module not found; You will not have RTL

>>> support for this paragraph"

>>> else:

>>>        for line in blPara.lines:

>>>        if isinstance(line, (FragLine, ParaLines)):

>>>                #When the line is a FragLine or ParaLines, Its

>>>                #text attribute of each of it's words is flipped.

>>>                #Then, the order of the words is flipped too,

>>>                #So that 2 word parts on the same line

>>>                #will be in the right order

>>>                for word in line.words:

>>>                word.text = pyfribidi.log2vis(word.text, base_direction=pyfribidi.ON)

>>>

>>>                line.words.reverse()

>>>

>>>        elif isinstance(line, tuple):

>>>                #When the line is just a tuple whose second value is the text.

>>>                #since I coulden't directly change it's value,

>>>                #it's done by merging the words, flipping them,

>>>                #and re-entering them one by one to the second attribute """

>>>

>>>                s = ' '.join(line[1])

>>>                s = pyfribidi.log2vis( s, base_direction=pyfribidi.ON)

>>>                line[1][:] = s.split()

>>>        else:

>>>                print line.__class__.__name__

>>> ######################

>>>

>>>

>>>

>>> I attached an archive containing a Hebrew font, and a test script.

>>> The script should test all cases I know that my patch should deal

>>> with, and adds an image of good results for comparison.

>>>

>>> Moshe

>>>

>>>

>>> On Fri, Jun 5, 2009 at 4:37 PM, Andy Robinson<andy at reportlab.com> wrote:

>>>> 2009/6/5 Moshe Wagner <moshe.wagner at gmail.com>:

>>>>> Is there any chance this could be added to the official code?

>>>>

>>>> Moshe, thanks very much for your contribution.  We're happy in

>>>> principle to add this kind of patch but it would help a great deal if

>>>> you could produce two more things...

>>>>

>>>> (a) a suitable few paragraphs for us to put in a Whats New page or the

>>>> user guide.  Mention what pyfribidi is, what version is needed (if it

>>>> matters) and where to get it.  Also mention what one needs to install

>>>> to view these things - do we need special fonts, Acrobat Language

>>>> packs etc..?   Assume the reader knows nothing about RTL.     Just

>>>> send text to me or the list and I'll add it to the docs and/or web

>>>> site.

>>>>

>>>> (b) most important of all, a small test script (see our 'tests'

>>>> folder) which generates some Hebrew and/or Arabic output, which we can

>>>> run and look at.   The absolute ideal test script would have a bitmap

>>>> of the correct Hebrew to look at, and say "the text below should look

>>>> like the above", since I at least would not know if it was backwards

>>>> or forwards ;-)

>>>>

>>>> Most people in ReportLab are too busy to have been following this in

>>>> detail but we'd really welcome any improvement in this area. We are

>>>> also starting from zero knowledge of Hebrew and Arabic - unlike Asian

>>>> text which we deal with daily.  There will be a release in a few weeks

>>>> and this would be a very valuable addition...

>>>>

>>>> Best Regards,

>>>>

>>>> --

>>>> Andy Robinson

>>>> CEO/Chief Architect

>>>> ReportLab Europe Ltd.

>>>> Media House, 3 Palmerston Road, Wimbledon, London SW19 1PG, UK

>>>> Tel +44-20-8545-1570

>>>> _______________________________________________

>>>> reportlab-users mailing list

>>>> reportlab-users at reportlab.com

>>>> http://two.pairlist.net/mailman/listinfo/reportlab-users

>>>>

>>>

>>

>



More information about the reportlab-users mailing list