[reportlab-users] MediaWiki's "Download as PDF" feature uses ReportLab but has a problem

Yao Ziyuan yaoziyuan at gmail.com
Tue Jan 10 12:23:59 EST 2012

On Wed, Jan 11, 2012 at 1:10 AM, Andy Robinson <andy at reportlab.com> wrote:

> On 10 January 2012 16:59, Yao Ziyuan <yaoziyuan at gmail.com> wrote:

>> I'm not familiar with Python. But I have a simple way for ReportLab to

>> process CJK line-wrapping transparently:


>> Before everything, for every CJK character found in the text, insert a

>> U+200B ("zero-width space") after it. This will logically make every

>> CJK character a possible line-wrapping point.


>> Then, recognize U+200B as a kind of whitespace in ReportLab's non-CJK

>> line-wrapping code.



> That's clever!  Thank you for this. I'll trust you that this works for

> Chinese, which unfortunately I don't speak/read/write.


> For Japanese, which I do know quite well, NOT every character is a

> good wrap point, and there are quite sophisticated rules about

> characters which should not begin or end a line.  Our present

> algorithm is really a "Japanese wrapping", not "CJK".


> The right answer is still probably a unicode-based algorithm for all

> languages.  I wish I had more time to work on it.

OK, I just found these links useful for CJK word wrap knowledge:

However, these links mention that no word processor really take all
these sophisticated rules into consideration. So instead of pursuing
perfectionism, ReportLab can simply stick to the most basic rule:
wrapping either after a whitespace or a CJK character. If ReportLab
indeed wants to implement all sophisticated rules, I suggest reusing
an existing open source Unicode word wrap library, instead of
reinventing all the wheels from scratch.


> - Andy

> _______________________________________________

> reportlab-users mailing list

> reportlab-users at lists2.reportlab.com

> http://two.pairlist.net/mailman/listinfo/reportlab-users

More information about the reportlab-users mailing list