[reportlab-users] Hebrew Support Patch
Moshe Wagner
moshe.wagner at gmail.com
Sun Jun 7 07:12:12 EDT 2009
Looking at my test again, I see there is still a bug with mixed texts.
I'll update when I fix it.
Moshe
On Sun, Jun 7, 2009 at 2:09 PM, Moshe Wagner<moshe.wagner at gmail.com> wrote:
> Well, I'm not a really great technical writer either, and I'm not
> really sure how much background you want. But I'll give the best
> description I can think of. As I said, feel free to change anything so
> it meets your requirements, or ask me to give more information on any
> point you think I didn't get into enough.
>
> (Note: I don't know enough about any other languages, so I'm strictly
> speaking about Hebrew. Arabic, for instance, is very similar in terms
> of being RTL, but has a few very different properties, such as joined
> letters. I do believe fribidi deals with that correctly, and therefore
> my patch should add Arabic support too, but I cannot promise that. Is
> there anyone who can test this? )
>
> Displaying Hebrew -
>
> First step for displaying any non ASCII characters, and therefore
> Hebrew as well, is obtaining a font containing it's characters. I
> didn't check all of the default PDF fonts, but those I did, did not
> include Hebrew glyphs.
> Instead, I use fonts from the Culmus project (http://culmus.sourceforge.net).
> ( In the test archive I included one font from there, and it's license
> file. I hope that's ok)
>
>
> 'Visual' and 'Logical' ordering -
>
> Once a font with the characters is used, single Hebrew characters can
> be displayed, but the words will still come out mirrored, as I'll
> explain.
> Say we take the word hello, that is, "שלום" ("Shalom"). If you see it
> correctly, you will see the character "ש" at the most right part of
> that word, as it's the first letter, and Hebrew is read from right to
> left.
> But if we would look at the word as an array of chars, 'c_str', the
> values would be:
> c_str[0] = "ש", c_str[1] = "ל", c_str[2] = "ו" and c_str[3]="ם".
>
> So when printed from left to right, as is usually done, the word will
> be shown as:
> "םולש",
> since the characters are printed by their real order (called
> 'logical'), but start from the wrong side.
>
> To avoid this, a 'visual' ordering is used instead of the 'logical' one.
> So the word "שלום" will be stored as -
> c_str[0] = "ם", c_str[1] = "ו", c_str[2] = "ל" and c_str[3]="ש", so
> when printed from left to right, it will be displayed as
> "שלום" - which is the correct order.
>
> The visual ordering must be used carefully, though, since when
> printing text that's split along a few lines, it will cause their
> order to switch too, as mirroring affects both axises.
> ( i.e, "שלום לכם", when each word is op a separate line, will be
> םולש
> םכל
> In logical ordering,
> and:
> לכם
> שלום
> In visual, which are both wrong.)
> The solution is to mirror every line on it's own, but that must be
> done in the wrapping function, but not before or after it.
>
>
> Fribidi and Pyfribidi -
>
> A library allowing to convert between 'logical' and 'visual' ordering,
> while testing if the text is RTL before mirroring it, and supporting
> mixed texts, where only the RTL part should be mirrored, is fribidi -
> "An implementation of the Unicode Bidirectional Algorithm (bidi)." -
> http://fribidi.org/.
>
> The python binding for this library is called pyfribidi -
> http://pyfribidi.sourceforge.net/ . (It does not require fribidi
> itself installed.)
> All versions of pyfribidi should work fine, but I suppose the newest
> version should always be used.
>
> My code -
> My code simply uses pyfribidi to add RTL (and mixed LTR and RTL
> strings) support to reportlab.
> In "canvas.py" it simply runs 'pyfribidi.log2vis' on the text, while
> in "paragraph.py" it does it to each line seperatly.
>
> This is I have added to the "canvas.py" file, right at the beginning
> of the "drawString" function:
> ######################
> # Hebrew text patch, Moshe Wagner, June 2009
> # <moshe.wagner at gmail.com>
>
> # Flips the given text with pyfribidi, if it's needed (i.e. Hebrew or Arabic)
> # If it could not be imported, it does nothing
> # Plain LTR texts will not be affected in any case.
> try:
> import pyfribidi
> text = pyfribidi.log2vis(text,base_direction=pyfribidi.ON)
>
> except ImportError:
> import sys
> print >> sys.stderr, "Fribidi module not found; You will not have RTL
> support for this paragraph"
> #####################
>
> And this is what I added to "paragraph.py", in the "wrap" function,
> right after the call to "self.breakLines":
> ######################
> # Hebrew text patch, Moshe Wagner, June 2009
> # <moshe.wagner at gmail.com>
>
> #This code fixes paragraphs with RTL text
>
> # It does it by flipping each line seperatly.
> # (Depending on the type of the line)
>
> # If fribidi cant be imported, it does nothing
> # Plain LTR texts will not be affected in any case.
>
> try:
> import pyfribidi
> except ImportError:
> import sys
> print >> sys.stderr, "Fribidi module not found; You will not have RTL
> support for this paragraph"
> else:
> for line in blPara.lines:
> if isinstance(line, (FragLine, ParaLines)):
> #When the line is a FragLine or ParaLines, Its
> #text attribute of each of it's words is flipped.
> #Then, the order of the words is flipped too,
> #So that 2 word parts on the same line
> #will be in the right order
> for word in line.words:
> word.text = pyfribidi.log2vis(word.text, base_direction=pyfribidi.ON)
>
> line.words.reverse()
>
> elif isinstance(line, tuple):
> #When the line is just a tuple whose second value is the text.
> #since I coulden't directly change it's value,
> #it's done by merging the words, flipping them,
> #and re-entering them one by one to the second attribute """
>
> s = ' '.join(line[1])
> s = pyfribidi.log2vis( s, base_direction=pyfribidi.ON)
> line[1][:] = s.split()
> else:
> print line.__class__.__name__
> ######################
>
>
>
> I attached an archive containing a Hebrew font, and a test script.
> The script should test all cases I know that my patch should deal
> with, and adds an image of good results for comparison.
>
> Moshe
>
>
> On Fri, Jun 5, 2009 at 4:37 PM, Andy Robinson<andy at reportlab.com> wrote:
>> 2009/6/5 Moshe Wagner <moshe.wagner at gmail.com>:
>>> Is there any chance this could be added to the official code?
>>
>> Moshe, thanks very much for your contribution. We're happy in
>> principle to add this kind of patch but it would help a great deal if
>> you could produce two more things...
>>
>> (a) a suitable few paragraphs for us to put in a Whats New page or the
>> user guide. Mention what pyfribidi is, what version is needed (if it
>> matters) and where to get it. Also mention what one needs to install
>> to view these things - do we need special fonts, Acrobat Language
>> packs etc..? Assume the reader knows nothing about RTL. Just
>> send text to me or the list and I'll add it to the docs and/or web
>> site.
>>
>> (b) most important of all, a small test script (see our 'tests'
>> folder) which generates some Hebrew and/or Arabic output, which we can
>> run and look at. The absolute ideal test script would have a bitmap
>> of the correct Hebrew to look at, and say "the text below should look
>> like the above", since I at least would not know if it was backwards
>> or forwards ;-)
>>
>> Most people in ReportLab are too busy to have been following this in
>> detail but we'd really welcome any improvement in this area. We are
>> also starting from zero knowledge of Hebrew and Arabic - unlike Asian
>> text which we deal with daily. There will be a release in a few weeks
>> and this would be a very valuable addition...
>>
>> Best Regards,
>>
>> --
>> Andy Robinson
>> CEO/Chief Architect
>> ReportLab Europe Ltd.
>> Media House, 3 Palmerston Road, Wimbledon, London SW19 1PG, UK
>> Tel +44-20-8545-1570
>> _______________________________________________
>> reportlab-users mailing list
>> reportlab-users at reportlab.com
>> http://two.pairlist.net/mailman/listinfo/reportlab-users
>>
>
More information about the reportlab-users
mailing list