[reportlab-users] MediaWiki's "Download as PDF" feature uses ReportLab but has a problem
yaoziyuan at gmail.com
Tue Jan 10 12:23:59 EST 2012
On Wed, Jan 11, 2012 at 1:10 AM, Andy Robinson <andy at reportlab.com> wrote:
> On 10 January 2012 16:59, Yao Ziyuan <yaoziyuan at gmail.com> wrote:
>> I'm not familiar with Python. But I have a simple way for ReportLab to
>> process CJK line-wrapping transparently:
>> Before everything, for every CJK character found in the text, insert a
>> U+200B ("zero-width space") after it. This will logically make every
>> CJK character a possible line-wrapping point.
>> Then, recognize U+200B as a kind of whitespace in ReportLab's non-CJK
>> line-wrapping code.
> That's clever! Thank you for this. I'll trust you that this works for
> Chinese, which unfortunately I don't speak/read/write.
> For Japanese, which I do know quite well, NOT every character is a
> good wrap point, and there are quite sophisticated rules about
> characters which should not begin or end a line. Our present
> algorithm is really a "Japanese wrapping", not "CJK".
> The right answer is still probably a unicode-based algorithm for all
> languages. I wish I had more time to work on it.
OK, I just found these links useful for CJK word wrap knowledge:
However, these links mention that no word processor really take all
these sophisticated rules into consideration. So instead of pursuing
perfectionism, ReportLab can simply stick to the most basic rule:
wrapping either after a whitespace or a CJK character. If ReportLab
indeed wants to implement all sophisticated rules, I suggest reusing
an existing open source Unicode word wrap library, instead of
reinventing all the wheels from scratch.
> - Andy
> reportlab-users mailing list
> reportlab-users at lists2.reportlab.com
More information about the reportlab-users