[reportlab-users] Hebrew Support Patch

Moshe Wagner moshe.wagner at gmail.com
Sun Jun 7 07:12:12 EDT 2009


Looking at my test again, I see there is still a bug with mixed texts.
I'll update when I fix it.

Moshe

On Sun, Jun 7, 2009 at 2:09 PM, Moshe Wagner<moshe.wagner at gmail.com> wrote:

> Well, I'm not a really great technical writer either, and I'm not

> really sure how much background you want. But I'll give the best

> description I can think of. As I said, feel free to change anything so

> it meets your requirements, or ask me to give more information on any

> point you think I didn't get into enough.

>

> (Note: I don't know enough about any other languages, so I'm strictly

> speaking about Hebrew. Arabic, for instance, is very similar in terms

> of being RTL, but has a few very different properties, such as joined

> letters. I do believe fribidi deals with that correctly, and therefore

> my patch should add Arabic support too, but I cannot promise that. Is

> there anyone who can test this?  )

>

> Displaying Hebrew -

>

> First step for displaying any non ASCII characters, and therefore

> Hebrew as well, is obtaining a font containing it's characters. I

> didn't check all of the default PDF fonts, but those I did, did not

> include Hebrew glyphs.

> Instead, I use fonts from the Culmus project (http://culmus.sourceforge.net).

> ( In the test archive I included one font from there, and it's license

> file. I hope that's ok)

>

>

> 'Visual' and 'Logical' ordering -

>

> Once a font with the characters is used, single Hebrew characters can

> be displayed, but the words will still come out mirrored, as I'll

> explain.

> Say we take the word hello, that is, "שלום" ("Shalom"). If you see it

> correctly, you will see the character "ש" at the most right part of

> that word, as it's the first letter, and Hebrew is read from right to

> left.

> But if we would look at the word as an array of chars, 'c_str', the

> values would be:

> c_str[0] = "ש", c_str[1] = "ל", c_str[2] = "ו" and c_str[3]="ם".

>

> So when printed from left to right, as is usually done, the word will

> be shown as:

> "םולש",

> since the characters are printed by their real order (called

> 'logical'), but start from the wrong side.

>

> To avoid this, a 'visual' ordering is used instead of the 'logical' one.

> So the word "שלום" will be stored as -

> c_str[0] = "ם", c_str[1] = "ו", c_str[2] = "ל" and c_str[3]="ש", so

> when printed from left to right, it will be displayed as

> "שלום" - which is the correct order.

>

> The visual ordering must be used carefully, though, since when

> printing text that's split along a few lines, it will cause their

> order to switch too, as mirroring affects both axises.

> ( i.e, "שלום לכם", when each word is op a separate line, will be

> םולש

> םכל

> In logical ordering,

> and:

> לכם

> שלום

>  In visual, which are both wrong.)

> The solution is to mirror every line on it's own, but that must be

> done in the wrapping function, but not before or after it.

>

>

> Fribidi and Pyfribidi -

>

> A library allowing to convert between 'logical' and 'visual' ordering,

> while testing if the text is RTL before mirroring it, and supporting

> mixed texts, where only the RTL part should be mirrored, is fribidi -

> "An implementation of the Unicode Bidirectional Algorithm (bidi)." -

> http://fribidi.org/.

>

> The python binding for this library is called pyfribidi -

> http://pyfribidi.sourceforge.net/ . (It does not require fribidi

> itself installed.)

> All versions of pyfribidi should work fine, but I suppose the newest

> version should always be used.

>

> My code -

> My code simply uses pyfribidi to add RTL (and mixed LTR and RTL

> strings) support to reportlab.

> In "canvas.py" it simply runs 'pyfribidi.log2vis' on the text, while

> in "paragraph.py" it does it to each line seperatly.

>

> This is I have added to the "canvas.py" file, right at the beginning

> of the "drawString" function:

> ######################

> # Hebrew text patch, Moshe Wagner, June 2009

> # <moshe.wagner at gmail.com>

>

> # Flips the given text with pyfribidi, if it's needed (i.e. Hebrew or Arabic)

> # If it could not be imported, it does nothing

> # Plain LTR texts will not be affected in any case.

> try:

>        import pyfribidi

>        text = pyfribidi.log2vis(text,base_direction=pyfribidi.ON)

>

> except ImportError:

>        import sys

>        print >> sys.stderr, "Fribidi module not found; You will not have RTL

> support for this paragraph"

> #####################

>

> And this is what I added to "paragraph.py", in the "wrap" function,

> right after the call to "self.breakLines":

> ######################

> # Hebrew text patch, Moshe Wagner, June 2009

> # <moshe.wagner at gmail.com>

>

> #This code fixes paragraphs with RTL text

>

> # It does it by flipping each line seperatly.

> #       (Depending on the type of the line)

>

> # If fribidi cant be imported, it does nothing

> # Plain LTR texts will not be affected in any case.

>

> try:

>        import pyfribidi

> except ImportError:

>        import sys

>        print >> sys.stderr, "Fribidi module not found; You will not have RTL

> support for this paragraph"

> else:

>        for line in blPara.lines:

>        if isinstance(line, (FragLine, ParaLines)):

>                #When the line is a FragLine or ParaLines, Its

>                #text attribute of each of it's words is flipped.

>                #Then, the order of the words is flipped too,

>                #So that 2 word parts on the same line

>                #will be in the right order

>                for word in line.words:

>                word.text = pyfribidi.log2vis(word.text, base_direction=pyfribidi.ON)

>

>                line.words.reverse()

>

>        elif isinstance(line, tuple):

>                #When the line is just a tuple whose second value is the text.

>                #since I coulden't directly change it's value,

>                #it's done by merging the words, flipping them,

>                #and re-entering them one by one to the second attribute """

>

>                s = ' '.join(line[1])

>                s = pyfribidi.log2vis( s, base_direction=pyfribidi.ON)

>                line[1][:] = s.split()

>        else:

>                print line.__class__.__name__

> ######################

>

>

>

> I attached an archive containing a Hebrew font, and a test script.

> The script should test all cases I know that my patch should deal

> with, and adds an image of good results for comparison.

>

> Moshe

>

>

> On Fri, Jun 5, 2009 at 4:37 PM, Andy Robinson<andy at reportlab.com> wrote:

>> 2009/6/5 Moshe Wagner <moshe.wagner at gmail.com>:

>>> Is there any chance this could be added to the official code?

>>

>> Moshe, thanks very much for your contribution.  We're happy in

>> principle to add this kind of patch but it would help a great deal if

>> you could produce two more things...

>>

>> (a) a suitable few paragraphs for us to put in a Whats New page or the

>> user guide.  Mention what pyfribidi is, what version is needed (if it

>> matters) and where to get it.  Also mention what one needs to install

>> to view these things - do we need special fonts, Acrobat Language

>> packs etc..?   Assume the reader knows nothing about RTL.     Just

>> send text to me or the list and I'll add it to the docs and/or web

>> site.

>>

>> (b) most important of all, a small test script (see our 'tests'

>> folder) which generates some Hebrew and/or Arabic output, which we can

>> run and look at.   The absolute ideal test script would have a bitmap

>> of the correct Hebrew to look at, and say "the text below should look

>> like the above", since I at least would not know if it was backwards

>> or forwards ;-)

>>

>> Most people in ReportLab are too busy to have been following this in

>> detail but we'd really welcome any improvement in this area. We are

>> also starting from zero knowledge of Hebrew and Arabic - unlike Asian

>> text which we deal with daily.  There will be a release in a few weeks

>> and this would be a very valuable addition...

>>

>> Best Regards,

>>

>> --

>> Andy Robinson

>> CEO/Chief Architect

>> ReportLab Europe Ltd.

>> Media House, 3 Palmerston Road, Wimbledon, London SW19 1PG, UK

>> Tel +44-20-8545-1570

>> _______________________________________________

>> reportlab-users mailing list

>> reportlab-users at reportlab.com

>> http://two.pairlist.net/mailman/listinfo/reportlab-users

>>

>



More information about the reportlab-users mailing list