[reportlab-users] errors parsing & > in paragraphs

Wed Oct 19 02:37:00 EDT 2005

> *From:* Timothy Smith <timothy at open-networks.net>
> *To:* Support list for users of Reportlab software 
> <reportlab-users at reportlab.com>
> *Date:* Wed, 19 Oct 2005 10:09:18 +1000
> 
<earlier posts snipped>

> >>From a prior post by Robin Becker I quote:
> >"you can escape all the < & > fairly easily. eg

> >fld=fld.replace('&','&amp;').replace('<','&lt;').replace('>','&gt;')"
> >I originally used a slightly less compact form and thought this was 
> nice.

> and there isn't anything else i might need to escape?

There's single and double quotes as well. But using the above you must be 
sure that the text doesn't already contain escaped ampersands. I use the 
following (Python 2.3) - it can probably be written more elegantly. 

def xss(astring):
    """ For escaping characters used in xml tags : <, >, &, ", '
        Usually called as  Paragraph(xss(text), style...
        If & found, check it is not already escaped
    """
    esclist = [('&', '&amp;'), ('<', '&lt;'),
               ('>', '&gt;'), ('"', '&quot;'),
               ("'", "&apos;")]

    for k in range(len(esclist)):
        if k == 0:
            alist = astring.split('&')
            for m in range(1, len(alist)):
                for n in range(len(esclist)):
                    if alist[m].startswith(esclist[n][1][1:]):
                        break
                else:
                    alist[m] = 'amp;' + alist[m]
            astring = '&'.join(alist)
        else:
            astring = astring.replace(esclist[k][0], esclist[k][1])

    return astring.encode('latin-1')      # reportlab not unicode

Regards,

David Hughes
Forestfield Software
www.foresoft.co.uk