[reportlab-users] Hyphenation
Henning von Bargen
H.vonBargen at t-p.com
Fri Apr 27 04:27:39 EDT 2018
Historically, I developed wordaxe with the goal to support automatic hyphenation for German texts with RL. Back then, in my job I used RL to dynamically create tables. Long words are common in German texts, which causes a lot of unused space on the paper when column sizes are relatively small. Furthermore, RL would not force a break inside a word back then, causing long words to cross column borders.
For the job at hand, a brute-force approach to avoid this was good enough.
As a *hobby project*, I thought it would be great if the library provided correct hyphenation. While researching, I stumbled over a project of the technical university of Vienna which showed that this needs an understanding of the way how compound words are created. For correct hyphenation, one has to find these components. I created a hyphenation algorithm (very slow!) which takes a given word, finds all the different ways this word could have been composed based on a special dictionary, and considers the hyphenation positions of these compositions. It then removes those positions which exist only in some but not all compositions, because these positions could result in misunderstandings for the reader.
It soon turned out that this was too huge a task for a one-man hobby project.
Thus, I developed a common API with alternative implementations. One implementation uses the publicly available dictionaries from Open Office (I don't know if OO still uses the same format dicts).
On the RL paragraph side, I finally gave up trying to understand what the original code is actually doing, so I started writing my own paragraph class, which was API compatible with the RL original (inspired by Dinu's attempt).
I added support for using the kerning information of TrueType fonts, too.
All this summed up to several weeks of work, still only for a hobby project.
In my job, I am using eclipse BIRT and hardly ever RL.
Today, with two kids and approaching the 50, I just don't feel right spending my spare time coding.
What I last did was to make the code ready for Python 3 IIRC.
But since I am no longer using the code myself actively for 10 years or so, I abandoned it and told RL that they may use the code for adapting it into the RL library if they like.
The repo is still on SourceForge. AFAIK the rst2pdf library by Roberto Alsina used/uses it.
Henning
More information about the reportlab-users
mailing list