[reportlab-users] Hyphenation
Lele Gaifax
lele at metapensiero.it
Mon Jun 18 05:29:36 EDT 2018
Dinu Gherman <gherman at darwin.in-berlin.de> writes:
> I was browsing a few hours ago on “python hyphenation” and found some stuff
> I was not aware of, like http://pyphen.org.
Thank you Dinu,
pyphen API is so straightforward that I could not resist trying to inject it
in the process, so I spent an hour this morning and I wrote a quick&dirty
hack, that is already able to handle the simplest case.
I wrote a PyphenParagraph class that accepts a "hyphenator" instance in its
constructor, overriding the "breakLines()" method and extending the "split()"
method. In "breakLines()", whenever it meets a word that does not fit in the
available space it calls a new "hyphenateWord()" method that may return a
(headWord, tailWord) pair on success, that it pushes back in the "words" list.
Basically:
class PyphenParagraph(Paragraph):
def __init__(self, *args, hyphenator=None, **kwargs):
self.hyphenator = hyphenator
super().__init__(*args, **kwargs)
def split(self, availWidth, availHeight):
# Propagate the hyphenator to the splitted paragraphs: parent's split() uses
# "self.__class__(foo, bar, spam=eggs)" to create them...
pair = super().split(availWidth, availHeight)
if pair:
pair[0].hyphenator = pair[1].hyphenator = self.hyphenator
return pair
def hyphenateWord(self, word, availWidth, fontName, fontSize):
for head, tail in self.hyphenator.iterate(word):
head += '-'
width = stringWidth(head, fontName, fontSize, self.encoding)
if width <= availWidth:
return _SplitText(head), tail
def breakLines(self, width):
... # untouched code up to
while words:
word = words.pop(0)
#this underscores my feeling that Unicode throughout would be easier!
wordWidth = stringWidth(word, fontName, fontSize, self.encoding)
newWidth = currentWidth + spaceWidth + wordWidth
if newWidth>maxWidth:
if self.hyphenator is not None and not isinstance(word, _SplitText):
pair = self.hyphenateWord(word, maxWidth - spaceWidth - currentWidth,
fontName, fontSize)
if pair is not None:
words[0:0] = pair
continue
... # untouched code till the end
However, I must be missing something in the "width" argument, because for
example when using a ImageAndFlowables it clearly uses the wrong width in the
"second" part (where the image ends so there's a wider space available)...
Anyway, before going any further in my experiments, I would like to know if I
am on a good track or not, to avoid wasting energy :-)
Here is my script: https://gist.github.com/lelit/9c1cba52fd6dd9f1123fe82ce4b788db
It obviously require a "pip install pyphen" and a copy of RL's
tests/pythonpowered.gif: executing it you will get a simple document with two
paragraphs, the first with an image in its top left corner and a second plain
paragraph. The latter is correct, while in the former you can spot a "bogus"
hyphenation is happening in the "Les-ser GPL" line...
Thanks in advance for any hint,
ciao, lele.
--
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele at metapensiero.it | -- Fortunato Depero, 1929.
More information about the reportlab-users
mailing list