[reportlab-users] Hebrew Support Patch

Moshe Wagner moshe.wagner at gmail.com
Mon Jun 8 05:00:27 EDT 2009

Previous message: [reportlab-users] Using PDF as an image in a ReportLab document
Next message: [reportlab-users] Hebrew Support Patch
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Sorry, seems like a posted the code with a stupid bug. This is how it
should be of course:
(The other function is still fine)
######################
# Hebrew text patch, Moshe Wagner, June 2009
# <moshe.wagner at gmail.com>

#This code fixes paragraphs with RTL text

# It does it by flipping each line separately.
# (Depending on the type of the line)

# If fribidi cant be imported, it does nothing
# Plain LTR texts will not be affected in any case.
try:
import pyfribidi
except ImportError:
import sys
print >> sys.stderr, "Fribidi module not found; You will not
have RTL support for this paragraph"
else:

#First, the base direction given to pyfribidi must be decided.
# In justified paragraphs, it's decided by their alignment.

# For now, there is only one type of fill justified paragraphs.
# So even though it acts like a left justified one,
# we cannot assume that's the alignment the text should have.
# So the direction is guessed by the first character of the text.

if self.style.alignment == TA_LEFT:
direction = pyfribidi.LTR
elif self.style.alignment == TA_RIGHT:
direction = pyfribidi.RTL
else:
# Get first character of the text:
c = ""
if isinstance(blPara.lines[0], (FragLine, ParaLines)):
if len(blPara.lines[0].words[0].text) < 2:
#This must be English, because Unicode chars take up 2
spaces in the array
direction = pyfribidi.LTR
else:
c = blPara.lines[0].words[0].text[0] +
blPara.lines[0].words[0].text[1]
elif isinstance(blPara.lines[0], tuple):
if len(blPara.lines[0][1]) < 2:
#This must be English, because Unicode chars take up 2
spaces in the array
direction = pyfribidi.LTR
else:
c = blPara.lines[0][1][0] + blPara.lines[0][1][1]
#Guess direction by it:
direction = self.guessBaseDirection(c)

for line in blPara.lines:
if isinstance(line, (FragLine, ParaLines)):
#When the line is a FragLine or ParaLines, Its
#text attribute of each of it's words is flipped.
#Then, the order of the words is flipped too,
#So that 2 word parts on the same line
#will be in the right order

for word in line.words:
word.text = pyfribidi.log2vis(word.text,direction)

line.words.reverse()

elif isinstance(line, tuple):
#When the line is just a tuple whose second value is the text.
#since I coulden't directly change it's value,
#it's done by merging the words, flipping them,
#and re-entering them one by one to the second attribute """

s = ' '.join(line[1])
s = pyfribidi.log2vis(s,direction)
line[1][:] = s.split()
else:
print line.__class__.__name__
######################

Also, I forgot to point out the archive I attached with my last email
includes the new test script for the patch.

Sorry again,
Moshe

On Mon, Jun 8, 2009 at 11:42 AM, Moshe Wagner<moshe.wagner at gmail.com> wrote:

> Well, I believe I fixed the system dealing with mixed texts.

>

> The problem was that fribidi must have a base direction given for any line.

> For instance, the line "hello world!", will stay intact when given to

> fribidi if the base direction is LTR, but should become "!hello

> world", if the base direction is RTL, since we want to look at it as

> part of a RTL paragraph, and therefore the end of the line is on the

> left.

>

> Until now, I used fribidi's auto detection for the base direction, but

> since I did it for every line separately, it would treat a line like

> the one I gave above always as LTR, ignoring the fact that it's in a

> RTL paragraph.

>

> The new code deals with that now by determining the base direction by

> the alignment type of the paragraph, and giving it to all lines in the

> paragraph.

>

> The only problem is with filled paragraphs, where the base direction

> can't be determined, since it still could be either RTL or LTR.

> The best solution is to allow two types of filled alignments,

> FILL_LEFT, and FILL_RIGHT, which is needed anyway for positioning the

> last line correctly, as I mentioned before.

> Could that be done?

>

>

> Anyway, until that could be done, I made the code use fribidi's auto

> detection for the base direction of filled paragraphs, so it's decided

> by there first character.

> It isn't ideal, but it works.

>

> So here is the new code:

> ######################

> # Hebrew text patch, Moshe Wagner, June 2009

> # <moshe.wagner at gmail.com>

>

> #This code fixes paragraphs with RTL text

>

> # It does it by flipping each line separately.

> # (Depending on the type of the line)

>

> # If fribidi cant be imported, it does nothing

> # Plain LTR texts will not be affected in any case.

> try:

> import pyfribidi

> except ImportError:

> import sys

> print >> sys.stderr, "Fribidi module not found; You will not

> have RTL support for this paragraph"

> else:

>

> #First, the base direction given to pyfribidi must be decided.

> # In justified paragraphs, it's decided by their alignment.

>

> # For now, there is only one type of fill justified paragraphs.

> # So even though it acts like a left justified one,

> # we cannot assume that's the alignment the text should have.

> # So the direction is guessed by the first character of the text.

>

> if self.style.alignment == TA_LEFT:

> direction = pyfribidi.LTR

> elif self.style.alignment == TA_RIGHT:

> direction = pyfribidi.RTL

> else:

> # Get first character of the text:

> c = ""

> if isinstance(blPara.lines[0], (FragLine, ParaLines)):

> c = blPara.lines[0].words[0].text[0] +

> blPara.lines[0].words[0].text[1]

> elif isinstance(blPara.lines[0], tuple):

> c = blPara.lines[0][1][0] + blPara.lines[0][1][1]

> #Guess direction by it:

> direction = self.guessBaseDirection(c)

>

> for line in blPara.lines:

> if isinstance(line, (FragLine, ParaLines)):

> #When the line is a FragLine or ParaLines, Its

> #text attribute of each of it's words is flipped.

> #Then, the order of the words is flipped too,

> #So that 2 word parts on the same line

> #will be in the right order

>

> for word in line.words:

> word.text = pyfribidi.log2vis(word.text,direction)

>

> line.words.reverse()

>

> elif isinstance(line, tuple):

> #When the line is just a tuple whose second value is the text.

> #since I coulden't directly change it's value,

> #it's done by merging the words, flipping them,

> #and re-entering them one by one to the second attribute """

>

> s = ' '.join(line[1])

> s = pyfribidi.log2vis(s,direction)

> line[1][:] = s.split()

> else:

> print line.__class__.__name__

> ######################

>

> And this function should be added before the 'wrap' function in the same class:

> ###############################

> # Guesses the direction the given text should have (LTR or RTL), for

> cases where it can't be decided by it's alignment

> def guessBaseDirection(self, s):

> # Since pyfribidi doesn't have an option to return fribidi's guess,

> # I have to find out it's guess in a very ugly way

>

> # This adds a neutral sign to the given text.

> # Then the text is mirrored, letting fribidi to guess it's direction.

> # If it's RTL text, the added sign will now become the first

> character of the text,

> # While if it's LTR the sign will stay at the end.

> import pyfribidi

>

> s += '.'

> s = pyfribidi.log2vis(s,pyfribidi.ON)

>

> if (s[0] == "."):

> return pyfribidi.RTL

> else:

> return pyfribidi.LTR

> ###############################

>

>

> Moshe

>

> Another point I got wrong is that pyfribidi *DOES* require fribidi

> itself, not like I said before.

>

>

> On Sun, Jun 7, 2009 at 2:12 PM, Moshe Wagner<moshe.wagner at gmail.com> wrote:

>> Looking at my test again, I see there is still a bug with mixed texts.

>> I'll update when I fix it.

>>

>> Moshe

>>

>> On Sun, Jun 7, 2009 at 2:09 PM, Moshe Wagner<moshe.wagner at gmail.com> wrote:

>>> Well, I'm not a really great technical writer either, and I'm not

>>> really sure how much background you want. But I'll give the best

>>> description I can think of. As I said, feel free to change anything so

>>> it meets your requirements, or ask me to give more information on any

>>> point you think I didn't get into enough.

>>>

>>> (Note: I don't know enough about any other languages, so I'm strictly

>>> speaking about Hebrew. Arabic, for instance, is very similar in terms

>>> of being RTL, but has a few very different properties, such as joined

>>> letters. I do believe fribidi deals with that correctly, and therefore

>>> my patch should add Arabic support too, but I cannot promise that. Is

>>> there anyone who can test this? )

>>>

>>> Displaying Hebrew -

>>>

>>> First step for displaying any non ASCII characters, and therefore

>>> Hebrew as well, is obtaining a font containing it's characters. I

>>> didn't check all of the default PDF fonts, but those I did, did not

>>> include Hebrew glyphs.

>>> Instead, I use fonts from the Culmus project (http://culmus.sourceforge.net).

>>> ( In the test archive I included one font from there, and it's license

>>> file. I hope that's ok)

>>>

>>>

>>> 'Visual' and 'Logical' ordering -

>>>

>>> Once a font with the characters is used, single Hebrew characters can

>>> be displayed, but the words will still come out mirrored, as I'll

>>> explain.

>>> Say we take the word hello, that is, "שלום" ("Shalom"). If you see it

>>> correctly, you will see the character "ש" at the most right part of

>>> that word, as it's the first letter, and Hebrew is read from right to

>>> left.

>>> But if we would look at the word as an array of chars, 'c_str', the

>>> values would be:

>>> c_str[0] = "ש", c_str[1] = "ל", c_str[2] = "ו" and c_str[3]="ם".

>>>

>>> So when printed from left to right, as is usually done, the word will

>>> be shown as:

>>> "םולש",

>>> since the characters are printed by their real order (called

>>> 'logical'), but start from the wrong side.

>>>

>>> To avoid this, a 'visual' ordering is used instead of the 'logical' one.

>>> So the word "שלום" will be stored as -

>>> c_str[0] = "ם", c_str[1] = "ו", c_str[2] = "ל" and c_str[3]="ש", so

>>> when printed from left to right, it will be displayed as

>>> "שלום" - which is the correct order.

>>>

>>> The visual ordering must be used carefully, though, since when

>>> printing text that's split along a few lines, it will cause their

>>> order to switch too, as mirroring affects both axises.

>>> ( i.e, "שלום לכם", when each word is op a separate line, will be

>>> םולש

>>> םכל

>>> In logical ordering,

>>> and:

>>> לכם

>>> שלום

>>> In visual, which are both wrong.)

>>> The solution is to mirror every line on it's own, but that must be

>>> done in the wrapping function, but not before or after it.

>>>

>>>

>>> Fribidi and Pyfribidi -

>>>

>>> A library allowing to convert between 'logical' and 'visual' ordering,

>>> while testing if the text is RTL before mirroring it, and supporting

>>> mixed texts, where only the RTL part should be mirrored, is fribidi -

>>> "An implementation of the Unicode Bidirectional Algorithm (bidi)." -

>>> http://fribidi.org/.

>>>

>>> The python binding for this library is called pyfribidi -

>>> http://pyfribidi.sourceforge.net/ . (It does not require fribidi

>>> itself installed.)

>>> All versions of pyfribidi should work fine, but I suppose the newest

>>> version should always be used.

>>>

>>> My code -

>>> My code simply uses pyfribidi to add RTL (and mixed LTR and RTL

>>> strings) support to reportlab.

>>> In "canvas.py" it simply runs 'pyfribidi.log2vis' on the text, while

>>> in "paragraph.py" it does it to each line seperatly.

>>>

>>> This is I have added to the "canvas.py" file, right at the beginning

>>> of the "drawString" function:

>>> ######################

>>> # Hebrew text patch, Moshe Wagner, June 2009

>>> # <moshe.wagner at gmail.com>

>>>

>>> # Flips the given text with pyfribidi, if it's needed (i.e. Hebrew or Arabic)

>>> # If it could not be imported, it does nothing

>>> # Plain LTR texts will not be affected in any case.

>>> try:

>>> import pyfribidi

>>> text = pyfribidi.log2vis(text,base_direction=pyfribidi.ON)

>>>

>>> except ImportError:

>>> import sys

>>> print >> sys.stderr, "Fribidi module not found; You will not have RTL

>>> support for this paragraph"

>>> #####################

>>>

>>> And this is what I added to "paragraph.py", in the "wrap" function,

>>> right after the call to "self.breakLines":

>>> ######################

>>> # Hebrew text patch, Moshe Wagner, June 2009

>>> # <moshe.wagner at gmail.com>

>>>

>>> #This code fixes paragraphs with RTL text

>>>

>>> # It does it by flipping each line seperatly.

>>> # (Depending on the type of the line)

>>>

>>> # If fribidi cant be imported, it does nothing

>>> # Plain LTR texts will not be affected in any case.

>>>

>>> try:

>>> import pyfribidi

>>> except ImportError:

>>> import sys

>>> print >> sys.stderr, "Fribidi module not found; You will not have RTL

>>> support for this paragraph"

>>> else:

>>> for line in blPara.lines:

>>> if isinstance(line, (FragLine, ParaLines)):

>>> #When the line is a FragLine or ParaLines, Its

>>> #text attribute of each of it's words is flipped.

>>> #Then, the order of the words is flipped too,

>>> #So that 2 word parts on the same line

>>> #will be in the right order

>>> for word in line.words:

>>> word.text = pyfribidi.log2vis(word.text, base_direction=pyfribidi.ON)

>>>

>>> line.words.reverse()

>>>

>>> elif isinstance(line, tuple):

>>> #When the line is just a tuple whose second value is the text.

>>> #since I coulden't directly change it's value,

>>> #it's done by merging the words, flipping them,

>>> #and re-entering them one by one to the second attribute """

>>>

>>> s = ' '.join(line[1])

>>> s = pyfribidi.log2vis( s, base_direction=pyfribidi.ON)

>>> line[1][:] = s.split()

>>> else:

>>> print line.__class__.__name__

>>> ######################

>>>

>>>

>>>

>>> I attached an archive containing a Hebrew font, and a test script.

>>> The script should test all cases I know that my patch should deal

>>> with, and adds an image of good results for comparison.

>>>

>>> Moshe

>>>

>>>

>>> On Fri, Jun 5, 2009 at 4:37 PM, Andy Robinson<andy at reportlab.com> wrote:

>>>> 2009/6/5 Moshe Wagner <moshe.wagner at gmail.com>:

>>>>> Is there any chance this could be added to the official code?

>>>>

>>>> Moshe, thanks very much for your contribution. We're happy in

>>>> principle to add this kind of patch but it would help a great deal if

>>>> you could produce two more things...

>>>>

>>>> (a) a suitable few paragraphs for us to put in a Whats New page or the

>>>> user guide. Mention what pyfribidi is, what version is needed (if it

>>>> matters) and where to get it. Also mention what one needs to install

>>>> to view these things - do we need special fonts, Acrobat Language

>>>> packs etc..? Assume the reader knows nothing about RTL. Just

>>>> send text to me or the list and I'll add it to the docs and/or web

>>>> site.

>>>>

>>>> (b) most important of all, a small test script (see our 'tests'

>>>> folder) which generates some Hebrew and/or Arabic output, which we can

>>>> run and look at. The absolute ideal test script would have a bitmap

>>>> of the correct Hebrew to look at, and say "the text below should look

>>>> like the above", since I at least would not know if it was backwards

>>>> or forwards ;-)

>>>>

>>>> Most people in ReportLab are too busy to have been following this in

>>>> detail but we'd really welcome any improvement in this area. We are

>>>> also starting from zero knowledge of Hebrew and Arabic - unlike Asian

>>>> text which we deal with daily. There will be a release in a few weeks

>>>> and this would be a very valuable addition...

>>>>

>>>> Best Regards,

>>>>

>>>> --

>>>> Andy Robinson

>>>> CEO/Chief Architect

>>>> ReportLab Europe Ltd.

>>>> Media House, 3 Palmerston Road, Wimbledon, London SW19 1PG, UK

>>>> Tel +44-20-8545-1570

>>>> _______________________________________________

>>>> reportlab-users mailing list

>>>> reportlab-users at reportlab.com

>>>> http://two.pairlist.net/mailman/listinfo/reportlab-users

>>>>

>>>

>>

>

Previous message: [reportlab-users] Using PDF as an image in a ReportLab document
Next message: [reportlab-users] Hebrew Support Patch
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the reportlab-users mailing list