[reportlab-users] Splitting paragraph in table cell on '-' as well as white space

Richard Galka rgalka at seccuris.com
Wed Jan 19 11:40:08 EST 2011


I noticed a question about breaking words/strings within a table via special characters occurred last month. This is something that we implemented and works well in our environment.

To ensure long words would break across tables, we subclassed the Paragraph class and modified the 'breakLines' method to either break on special characters or by using a dictionary file (www.gutenberg.org/ebooks/3204 in particular).

We are using ReportLab 2.4, and subclassed the Paragraph class adding two methods (breakOnSyntax and breakWithDictionary) and overriding the original Paragraphs __init__ and breakLines methods.

Hopefully this can help someone out, and again, please remember this is for ReportLab 2.4 and may not be relevant for other versions.


def __init__(self, text, style,
bulletText = None,
frags = None,
caseSensitive=1,
encoding='utf8',
dictLoc=None,
breakOnSpecial=True):
"""
Initialize the Paragraph structure and dictionary to use for breaking

"""
self.breakOnSpecial = breakOnSpecial
if dictLoc:
self.breakOnDict=True
self._dict = dictLoc
else:
self.breakOnDict=False
self._dict = None
Paragraph.__init__(self, text,style,bulletText,frags,caseSensitive,encoding)




def breakOnSyntax(self, word, syntax=['-','/','\\']):
""" Will split a word based on any symbol(s) identified
Args:
word: A string in ascii / utf to be split
syntax: A list of characters in which the word may be split
The list is priorty based, and word will be split on first
character identified
Return:
Returns a list composed of:
The split words
Character chosen to split words
"""

newwords = []
syntaxused = ''

for s in syntax:
newwords = word.split(s)
if len(newwords) > 1:
syntaxused = s
break
if len(newwords)>1:
return (newwords, s)
else:
return ([word], '')


def breakWithDictionary(self, word, dictFile=None, dictBreakChar=None):
""" Will split a word nearest the middle based on a dictionary file
Args:
word: A string in ascii / utf to be split
dict: (optional) A Dictionary file identifying lexical makeup
Return:
Returns a list composed of
The split words
"""
if dictBreakChar is None:
dictBreakChar = chr(165) # Chr(165) used for Moby hyphenator II

newwords = []
if dictFile is None:
dictFile = self._dict

try:
file = open(dictFile)
except (IOError, TypeError):
return [word]

for line in file:
line = line.strip()
if word == line.replace(dictBreakChar, ''):
#Logical break in the word identified
splitword = line.split(dictBreakChar)
for partialword in splitword:
newwords.append(partialword.replace(dictBreakChar,''))
break

if not newwords:
# No logical break identified
pass

file.close()
if newwords:
return newwords
else:
return [word]



In the breakLines(self, width) method we copied the Paragraph method the below modified: (
... #Original Method code
...
wordcnt=0
for word in words:
wordcnt=wordcnt+1
newwords = []
#Make a word array splitting long words
wordWidth = pdfmetrics.stringWidth(word, fontName, fontSize, self.encoding)
newWidth = currentWidth + spaceWidth + wordWidth

If newWidth <= maxWidth:
...# Original Method Code
...
else:
if (newWidth-currentWidth) > maxWidth:
syntaxuse = ''
#Break apart word if appropriate
if breakOnSpecial:
(newwords,syntaxuse) = self.breakOnSyntax(word)
elif breakOnDict:
newwords = self.breakWithDictionary(word)
if newwords and len(newwords)==1:
word = newwords[0]
elif newwords and len(newwords)>1:
#split into two words
# Below attempts to split into two equal sized words.
# TODO: Identify a better 'join' method using 'maxWidth' and font metrics.
while(len(newwords)>2):
if len(newwords[0]) < len(newwords[-1]):
# Append beginning
tmp = newwords[0]
newwords.remove(newwords[0])
newwords[0] = tmp+syntaxuse+newwords[0]
else:
tmp = newwords.pop()
newwords[-1] = newwords[-1]+syntaxuse+tmp

#Place newword on wordlist
words.insert(wordcnt, newwords[1])
word = newwords[0]+syntaxuse

wordWidth = pdfmetrics.stringWidth(word, fontName, fontSize, self.encoding)
newWidth = currentWidth + spaceWidth + wordWidth
...
... #Original Method code



Richard Galka
Secure Software Analyst
Seccuris Inc.
100 - 321 McDermot Ave, Winnipeg, MB R3A 0A3
Tel: (204) 255-4136 ext #219
Fax: (204) 942-6705
MSS Tel: 1-866-770-7958
MSS Email: MSS at seccuris.com<mailto:MSS at seccuris.com>



This communication, including any attachments, does not necessarily represent official policy of Seccuris Inc.
Please see http://www.seccuris.com/Contact-PrivacyPolicy.htm for further details about Seccuris Inc.'s Privacy Policy.
If you have received this communication in error, please notify Seccuris Inc. at info at seccuris.com or at 1-866-644-8442.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://two.pairlist.net/pipermail/reportlab-users/attachments/20110119/e3a4d0ab/attachment-0001.htm>


More information about the reportlab-users mailing list