robin at reportlab.com
Sun Jun 24 07:42:50 EDT 2018
I have checked in a branch 'hyphenation' with a working approach to
hyphenation. Currently the hyphenation is set by a paragraph instance
attribute 'hyphenator' or as a style attribute 'hyphenationLang'.
If the result is a callable that is assumed to map unicode 'words' to
pairs. A simple way of selecting a best split is done. If the pair is split
we can add a hyphen (currently '-' (the minus hyphen). The logic for using
the result and recombining if the layout needs re-doing (which was Lele's
nightmare) seems to work for both simple and non-simple paragraphs (ie the
word and frag paragraph cases).
All the tests pass, but I haven't yet checked in python >= 3.3; even so the
reults of this primitive approach are not yet useful because
1) we need to decide rules for when hyphenation should be done (ie what
kind of words are good) the old rule used to be letters followed by one
2) I'm not yet handling the case of words which have multiple styles. That
is doable, but is it desirable.
3) rl will be handling splitting of long uri's separately, but the rules
for when this should be done are up for discussion eg if a long url will
fit on the next page should we just punt the split to (</br>,uri) etc etc.
4) The current hyphenation is based on pyphen, I'm sure the old wordaxe
paragraph has the answers to any of these issues so I will take a good look
On 22 June 2018 at 20:44, Robin Becker <robin at reportlab.com> wrote:
> Hi Lele,
> I am allowing changes in the paragraph style setting. So the rl_settings
> value is just the default.
> On 22 June 2018 at 18:08, Lele Gaifax <lele at metapensiero.it> wrote:
>> Robin Becker <robin at reportlab.com> writes:
>> > I have mostly got hyphenation working using Pyphen. Currently about 5
>> tests fail because of
>> > simple paragraph corner cases involving splits.
>> > I will try and finish a working version next week.
>> Great news, thank you!
>> > Unfortunately my simple approach is also trying to hyphenate things like
>> > URLS which I suppose should be handled separately.
>> > Also currently I lack a way to get just the word and not
>> non-alpahabetics. I
>> > suppose that should be easy if we know what constitutes hyphenatable
>> > Any ideas welcome.
>> IMHO I would leave such decision to the final user, as I do not think
>> there is
>> one single *right* answer... Even for URLs, which I actually happen to
>> in the app where I experimented this matter, I could not reach a
>> consensus on
>> what should happen, I mean between what currently happens:
>> |This is a long URL: https://hostname/contentna|
>> and with pyphen:
>> |This is a long URL: https://hostname/content- |
>> it's obviously debatable...
>> > I am presently running with the idea of using a string setting to
>> > what language so my settings override has
>> > hyphenationLang='en_GB'
>> > which corresponds to one of the pyphen dictionaries. This gets into the
>> > style and is used only if pyphen can be imported.
>> Not sure what you mean here, but just to make my use case clear: the app
>> currently developing is multilingual, and produces several PDFs for a
>> "item", one for each language the customer decided to support. So having a
>> "static" setting for the target language would not work very well for
>> best would be having a way to pass the hyphenator to the Paragraph
>> constructor, possibly taking a default from the SimpleDocument...
>> I will surely try out your solution as soon as it hits the repository and
>> report back.
>> Thanks again,
>> ciao, lele.
>> nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
>> real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
>> lele at metapensiero.it | -- Fortunato Depero, 1929.
>> reportlab-users mailing list
>> reportlab-users at lists2.reportlab.com
> Robin Becker
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the reportlab-users