[reportlab-users] MediaWiki's "Download as PDF" feature uses ReportLab but has a problem

Yao Ziyuan yaoziyuan at gmail.com
Tue Jan 10 11:59:16 EST 2012


On Wed, Jan 11, 2012 at 12:40 AM, Andy Robinson <andy at reportlab.com> wrote:

> On 10 January 2012 16:19, Ziyuan Yao <yaoziyuan at gmail.com> wrote:

>>

>>

>> Actually, ReportLab doesn't need a wordwrap=CJK option. Instead, ReportLab

>> can wrap text (Western or CJK or mixed) in a unified manner:

>>

>> IF there is a whitespace near the page's right margin THEN

>>         wrap after that whitespace;

>> ELSE IF there is a CJK character near the page's right margin THEN

>>         wrap after that CJK character;

>> ELSE

>>         wrap forcibly at the page's right margin.

>>

>

>

> I understand the principles fully, but I am sorry that we haven't yet

> found the time to implement this.  When we launched it was

> pre-Unicode.  ReportLab has one major Asian-language commercial

> customer, who is quite happy with their output now as they don't mix

> languages or have long english technical expressions.  When we first

> wrote the package it was before Python's unicode support.  We would

> probably also need some support from C code for speed.

>

> If some contributors (e.g. you?) have time to work on this and supply

> a better wrapping algorithm, we would be very happy to review code and

> migrate everything onto it.

>

> The ideal wrapping algorithm must support

> (a) CJK wrapping when detected

> (b) hyphenation (for long German words etc), and some sane rules for

> breaking long URLs

> (c) inline non-text objects, such as equation images used heavily by Wikipedia

> (d) support for varying fonts, and maybe even kerning or horizontal

> compression, and

> (e) right-to-left text for Arabic.

>

> This is not a trivial problem.  We "cheated" badly by having an

> English and then a CJK wrapping algorithm which is how we got to the

> present position.

>

> I would love to have some more people working on it but sadly it's not

> a requirement for current customers and our team is pretty busy these

> days...


I'm not familiar with Python. But I have a simple way for ReportLab to
process CJK line-wrapping transparently:

Before everything, for every CJK character found in the text, insert a
U+200B ("zero-width space") after it. This will logically make every
CJK character a possible line-wrapping point.

Then, recognize U+200B as a kind of whitespace in ReportLab's non-CJK
line-wrapping code.

This way, ReportLab won't need a separate wordwrap=CJK wrapping
algorithm. It will be able to handle CJK using the same wrapping
algorithm for Western text.


>

>

> Best Regards,

> --

> Andy Robinson

> Managing Director

> ReportLab Europe Ltd.

> Thornton House, Thornton Road, Wimbledon, London SW19 4NG, UK

> Tel +44-20-8405-6420

> _______________________________________________

> reportlab-users mailing list

> reportlab-users at lists2.reportlab.com

> http://two.pairlist.net/mailman/listinfo/reportlab-users



More information about the reportlab-users mailing list