[reportlab-users] Hebrew Support Patch
Moshe Wagner
moshe.wagner at gmail.com
Mon Jun 8 05:00:27 EDT 2009
Sorry, seems like a posted the code with a stupid bug. This is how it
should be of course:
(The other function is still fine)
######################
# Hebrew text patch, Moshe Wagner, June 2009
# <moshe.wagner at gmail.com>
#This code fixes paragraphs with RTL text
# It does it by flipping each line separately.
# (Depending on the type of the line)
# If fribidi cant be imported, it does nothing
# Plain LTR texts will not be affected in any case.
try:
import pyfribidi
except ImportError:
import sys
print >> sys.stderr, "Fribidi module not found; You will not
have RTL support for this paragraph"
else:
#First, the base direction given to pyfribidi must be decided.
# In justified paragraphs, it's decided by their alignment.
# For now, there is only one type of fill justified paragraphs.
# So even though it acts like a left justified one,
# we cannot assume that's the alignment the text should have.
# So the direction is guessed by the first character of the text.
if self.style.alignment == TA_LEFT:
direction = pyfribidi.LTR
elif self.style.alignment == TA_RIGHT:
direction = pyfribidi.RTL
else:
# Get first character of the text:
c = ""
if isinstance(blPara.lines[0], (FragLine, ParaLines)):
if len(blPara.lines[0].words[0].text) < 2:
#This must be English, because Unicode chars take up 2
spaces in the array
direction = pyfribidi.LTR
else:
c = blPara.lines[0].words[0].text[0] +
blPara.lines[0].words[0].text[1]
elif isinstance(blPara.lines[0], tuple):
if len(blPara.lines[0][1]) < 2:
#This must be English, because Unicode chars take up 2
spaces in the array
direction = pyfribidi.LTR
else:
c = blPara.lines[0][1][0] + blPara.lines[0][1][1]
#Guess direction by it:
direction = self.guessBaseDirection(c)
for line in blPara.lines:
if isinstance(line, (FragLine, ParaLines)):
#When the line is a FragLine or ParaLines, Its
#text attribute of each of it's words is flipped.
#Then, the order of the words is flipped too,
#So that 2 word parts on the same line
#will be in the right order
for word in line.words:
word.text = pyfribidi.log2vis(word.text,direction)
line.words.reverse()
elif isinstance(line, tuple):
#When the line is just a tuple whose second value is the text.
#since I coulden't directly change it's value,
#it's done by merging the words, flipping them,
#and re-entering them one by one to the second attribute """
s = ' '.join(line[1])
s = pyfribidi.log2vis(s,direction)
line[1][:] = s.split()
else:
print line.__class__.__name__
######################
Also, I forgot to point out the archive I attached with my last email
includes the new test script for the patch.
Sorry again,
Moshe
On Mon, Jun 8, 2009 at 11:42 AM, Moshe Wagner<moshe.wagner at gmail.com> wrote:
> Well, I believe I fixed the system dealing with mixed texts.
>
> The problem was that fribidi must have a base direction given for any line.
> For instance, the line "hello world!", will stay intact when given to
> fribidi if the base direction is LTR, but should become "!hello
> world", if the base direction is RTL, since we want to look at it as
> part of a RTL paragraph, and therefore the end of the line is on the
> left.
>
> Until now, I used fribidi's auto detection for the base direction, but
> since I did it for every line separately, it would treat a line like
> the one I gave above always as LTR, ignoring the fact that it's in a
> RTL paragraph.
>
> The new code deals with that now by determining the base direction by
> the alignment type of the paragraph, and giving it to all lines in the
> paragraph.
>
> The only problem is with filled paragraphs, where the base direction
> can't be determined, since it still could be either RTL or LTR.
> The best solution is to allow two types of filled alignments,
> FILL_LEFT, and FILL_RIGHT, which is needed anyway for positioning the
> last line correctly, as I mentioned before.
> Could that be done?
>
>
> Anyway, until that could be done, I made the code use fribidi's auto
> detection for the base direction of filled paragraphs, so it's decided
> by there first character.
> It isn't ideal, but it works.
>
> So here is the new code:
> ######################
> # Hebrew text patch, Moshe Wagner, June 2009
> # <moshe.wagner at gmail.com>
>
> #This code fixes paragraphs with RTL text
>
> # It does it by flipping each line separately.
> # (Depending on the type of the line)
>
> # If fribidi cant be imported, it does nothing
> # Plain LTR texts will not be affected in any case.
> try:
> import pyfribidi
> except ImportError:
> import sys
> print >> sys.stderr, "Fribidi module not found; You will not
> have RTL support for this paragraph"
> else:
>
> #First, the base direction given to pyfribidi must be decided.
> # In justified paragraphs, it's decided by their alignment.
>
> # For now, there is only one type of fill justified paragraphs.
> # So even though it acts like a left justified one,
> # we cannot assume that's the alignment the text should have.
> # So the direction is guessed by the first character of the text.
>
> if self.style.alignment == TA_LEFT:
> direction = pyfribidi.LTR
> elif self.style.alignment == TA_RIGHT:
> direction = pyfribidi.RTL
> else:
> # Get first character of the text:
> c = ""
> if isinstance(blPara.lines[0], (FragLine, ParaLines)):
> c = blPara.lines[0].words[0].text[0] +
> blPara.lines[0].words[0].text[1]
> elif isinstance(blPara.lines[0], tuple):
> c = blPara.lines[0][1][0] + blPara.lines[0][1][1]
> #Guess direction by it:
> direction = self.guessBaseDirection(c)
>
> for line in blPara.lines:
> if isinstance(line, (FragLine, ParaLines)):
> #When the line is a FragLine or ParaLines, Its
> #text attribute of each of it's words is flipped.
> #Then, the order of the words is flipped too,
> #So that 2 word parts on the same line
> #will be in the right order
>
> for word in line.words:
> word.text = pyfribidi.log2vis(word.text,direction)
>
> line.words.reverse()
>
> elif isinstance(line, tuple):
> #When the line is just a tuple whose second value is the text.
> #since I coulden't directly change it's value,
> #it's done by merging the words, flipping them,
> #and re-entering them one by one to the second attribute """
>
> s = ' '.join(line[1])
> s = pyfribidi.log2vis(s,direction)
> line[1][:] = s.split()
> else:
> print line.__class__.__name__
> ######################
>
> And this function should be added before the 'wrap' function in the same class:
> ###############################
> # Guesses the direction the given text should have (LTR or RTL), for
> cases where it can't be decided by it's alignment
> def guessBaseDirection(self, s):
> # Since pyfribidi doesn't have an option to return fribidi's guess,
> # I have to find out it's guess in a very ugly way
>
> # This adds a neutral sign to the given text.
> # Then the text is mirrored, letting fribidi to guess it's direction.
> # If it's RTL text, the added sign will now become the first
> character of the text,
> # While if it's LTR the sign will stay at the end.
> import pyfribidi
>
> s += '.'
> s = pyfribidi.log2vis(s,pyfribidi.ON)
>
> if (s[0] == "."):
> return pyfribidi.RTL
> else:
> return pyfribidi.LTR
> ###############################
>
>
> Moshe
>
> Another point I got wrong is that pyfribidi *DOES* require fribidi
> itself, not like I said before.
>
>
> On Sun, Jun 7, 2009 at 2:12 PM, Moshe Wagner<moshe.wagner at gmail.com> wrote:
>> Looking at my test again, I see there is still a bug with mixed texts.
>> I'll update when I fix it.
>>
>> Moshe
>>
>> On Sun, Jun 7, 2009 at 2:09 PM, Moshe Wagner<moshe.wagner at gmail.com> wrote:
>>> Well, I'm not a really great technical writer either, and I'm not
>>> really sure how much background you want. But I'll give the best
>>> description I can think of. As I said, feel free to change anything so
>>> it meets your requirements, or ask me to give more information on any
>>> point you think I didn't get into enough.
>>>
>>> (Note: I don't know enough about any other languages, so I'm strictly
>>> speaking about Hebrew. Arabic, for instance, is very similar in terms
>>> of being RTL, but has a few very different properties, such as joined
>>> letters. I do believe fribidi deals with that correctly, and therefore
>>> my patch should add Arabic support too, but I cannot promise that. Is
>>> there anyone who can test this? )
>>>
>>> Displaying Hebrew -
>>>
>>> First step for displaying any non ASCII characters, and therefore
>>> Hebrew as well, is obtaining a font containing it's characters. I
>>> didn't check all of the default PDF fonts, but those I did, did not
>>> include Hebrew glyphs.
>>> Instead, I use fonts from the Culmus project (http://culmus.sourceforge.net).
>>> ( In the test archive I included one font from there, and it's license
>>> file. I hope that's ok)
>>>
>>>
>>> 'Visual' and 'Logical' ordering -
>>>
>>> Once a font with the characters is used, single Hebrew characters can
>>> be displayed, but the words will still come out mirrored, as I'll
>>> explain.
>>> Say we take the word hello, that is, "שלום" ("Shalom"). If you see it
>>> correctly, you will see the character "ש" at the most right part of
>>> that word, as it's the first letter, and Hebrew is read from right to
>>> left.
>>> But if we would look at the word as an array of chars, 'c_str', the
>>> values would be:
>>> c_str[0] = "ש", c_str[1] = "ל", c_str[2] = "ו" and c_str[3]="ם".
>>>
>>> So when printed from left to right, as is usually done, the word will
>>> be shown as:
>>> "םולש",
>>> since the characters are printed by their real order (called
>>> 'logical'), but start from the wrong side.
>>>
>>> To avoid this, a 'visual' ordering is used instead of the 'logical' one.
>>> So the word "שלום" will be stored as -
>>> c_str[0] = "ם", c_str[1] = "ו", c_str[2] = "ל" and c_str[3]="ש", so
>>> when printed from left to right, it will be displayed as
>>> "שלום" - which is the correct order.
>>>
>>> The visual ordering must be used carefully, though, since when
>>> printing text that's split along a few lines, it will cause their
>>> order to switch too, as mirroring affects both axises.
>>> ( i.e, "שלום לכם", when each word is op a separate line, will be
>>> םולש
>>> םכל
>>> In logical ordering,
>>> and:
>>> לכם
>>> שלום
>>> In visual, which are both wrong.)
>>> The solution is to mirror every line on it's own, but that must be
>>> done in the wrapping function, but not before or after it.
>>>
>>>
>>> Fribidi and Pyfribidi -
>>>
>>> A library allowing to convert between 'logical' and 'visual' ordering,
>>> while testing if the text is RTL before mirroring it, and supporting
>>> mixed texts, where only the RTL part should be mirrored, is fribidi -
>>> "An implementation of the Unicode Bidirectional Algorithm (bidi)." -
>>> http://fribidi.org/.
>>>
>>> The python binding for this library is called pyfribidi -
>>> http://pyfribidi.sourceforge.net/ . (It does not require fribidi
>>> itself installed.)
>>> All versions of pyfribidi should work fine, but I suppose the newest
>>> version should always be used.
>>>
>>> My code -
>>> My code simply uses pyfribidi to add RTL (and mixed LTR and RTL
>>> strings) support to reportlab.
>>> In "canvas.py" it simply runs 'pyfribidi.log2vis' on the text, while
>>> in "paragraph.py" it does it to each line seperatly.
>>>
>>> This is I have added to the "canvas.py" file, right at the beginning
>>> of the "drawString" function:
>>> ######################
>>> # Hebrew text patch, Moshe Wagner, June 2009
>>> # <moshe.wagner at gmail.com>
>>>
>>> # Flips the given text with pyfribidi, if it's needed (i.e. Hebrew or Arabic)
>>> # If it could not be imported, it does nothing
>>> # Plain LTR texts will not be affected in any case.
>>> try:
>>> import pyfribidi
>>> text = pyfribidi.log2vis(text,base_direction=pyfribidi.ON)
>>>
>>> except ImportError:
>>> import sys
>>> print >> sys.stderr, "Fribidi module not found; You will not have RTL
>>> support for this paragraph"
>>> #####################
>>>
>>> And this is what I added to "paragraph.py", in the "wrap" function,
>>> right after the call to "self.breakLines":
>>> ######################
>>> # Hebrew text patch, Moshe Wagner, June 2009
>>> # <moshe.wagner at gmail.com>
>>>
>>> #This code fixes paragraphs with RTL text
>>>
>>> # It does it by flipping each line seperatly.
>>> # (Depending on the type of the line)
>>>
>>> # If fribidi cant be imported, it does nothing
>>> # Plain LTR texts will not be affected in any case.
>>>
>>> try:
>>> import pyfribidi
>>> except ImportError:
>>> import sys
>>> print >> sys.stderr, "Fribidi module not found; You will not have RTL
>>> support for this paragraph"
>>> else:
>>> for line in blPara.lines:
>>> if isinstance(line, (FragLine, ParaLines)):
>>> #When the line is a FragLine or ParaLines, Its
>>> #text attribute of each of it's words is flipped.
>>> #Then, the order of the words is flipped too,
>>> #So that 2 word parts on the same line
>>> #will be in the right order
>>> for word in line.words:
>>> word.text = pyfribidi.log2vis(word.text, base_direction=pyfribidi.ON)
>>>
>>> line.words.reverse()
>>>
>>> elif isinstance(line, tuple):
>>> #When the line is just a tuple whose second value is the text.
>>> #since I coulden't directly change it's value,
>>> #it's done by merging the words, flipping them,
>>> #and re-entering them one by one to the second attribute """
>>>
>>> s = ' '.join(line[1])
>>> s = pyfribidi.log2vis( s, base_direction=pyfribidi.ON)
>>> line[1][:] = s.split()
>>> else:
>>> print line.__class__.__name__
>>> ######################
>>>
>>>
>>>
>>> I attached an archive containing a Hebrew font, and a test script.
>>> The script should test all cases I know that my patch should deal
>>> with, and adds an image of good results for comparison.
>>>
>>> Moshe
>>>
>>>
>>> On Fri, Jun 5, 2009 at 4:37 PM, Andy Robinson<andy at reportlab.com> wrote:
>>>> 2009/6/5 Moshe Wagner <moshe.wagner at gmail.com>:
>>>>> Is there any chance this could be added to the official code?
>>>>
>>>> Moshe, thanks very much for your contribution. We're happy in
>>>> principle to add this kind of patch but it would help a great deal if
>>>> you could produce two more things...
>>>>
>>>> (a) a suitable few paragraphs for us to put in a Whats New page or the
>>>> user guide. Mention what pyfribidi is, what version is needed (if it
>>>> matters) and where to get it. Also mention what one needs to install
>>>> to view these things - do we need special fonts, Acrobat Language
>>>> packs etc..? Assume the reader knows nothing about RTL. Just
>>>> send text to me or the list and I'll add it to the docs and/or web
>>>> site.
>>>>
>>>> (b) most important of all, a small test script (see our 'tests'
>>>> folder) which generates some Hebrew and/or Arabic output, which we can
>>>> run and look at. The absolute ideal test script would have a bitmap
>>>> of the correct Hebrew to look at, and say "the text below should look
>>>> like the above", since I at least would not know if it was backwards
>>>> or forwards ;-)
>>>>
>>>> Most people in ReportLab are too busy to have been following this in
>>>> detail but we'd really welcome any improvement in this area. We are
>>>> also starting from zero knowledge of Hebrew and Arabic - unlike Asian
>>>> text which we deal with daily. There will be a release in a few weeks
>>>> and this would be a very valuable addition...
>>>>
>>>> Best Regards,
>>>>
>>>> --
>>>> Andy Robinson
>>>> CEO/Chief Architect
>>>> ReportLab Europe Ltd.
>>>> Media House, 3 Palmerston Road, Wimbledon, London SW19 1PG, UK
>>>> Tel +44-20-8545-1570
>>>> _______________________________________________
>>>> reportlab-users mailing list
>>>> reportlab-users at reportlab.com
>>>> http://two.pairlist.net/mailman/listinfo/reportlab-users
>>>>
>>>
>>
>
More information about the reportlab-users
mailing list