[reportlab-users] Platypus XML markup suggestion

Christoph Zwerschke reportlab-users@reportlab.com
Thu, 22 May 2003 16:57:04 +0200


This is my first posting in this list, and I want to first express thanks
for this great and useful tool. I am using it to dynamically generate
documents from a web application based on Webware. More concrete, I produce
PDF sheets with guidelines for handling hazardous materials from a database
with hazardous substances.

Here is suggestion for improvement of the Platypus intraparagraph markup.

In the current version it is possible to print a German Umlaut a (ä) by
using the unnamed entity ä, but not by using the named entity ä -
on the contrary it is possible to print a Greek letter alpha by using the
named entity α but not using the euivalent unnamed entity α. This
is a quite inconsistent behaviour. In my database are substance names with
German Umlauts and Greek letters. I generally store them as unnamed entities
because some web browsers do not support all named entities or spell them
differently. I then have the problem that greek letters and symbols such as
"lower equal" (for instance used in the specification of concentrations in
percent) are not visible in the Reportlab produced PDF documents.

Here is a solution wich will solve the general problem and add unnamed
entities for the symbol font to the platypus/paraparser.py module.

You need to insert code at two places. First, add a handle_charref handler
for the unnamed entities. The best place is directly before the definition
of handle_entityref starting with the line #### greek script:


    #### add symbol encoding
    def handle_charref(self, name):
        try:
            if name[0] == 'x':
                n = string.atoi(name[1:], 16)
            else:
                n = string.atoi(name)
        except string.atoi_error:
            self.unknown_charref(name)
            return
        if 0 <= n <= 255:
            self.handle_data(chr(n))
        else:
            try:
                c = symenc[n]
            except KeyError:
                self.unknown_charref(name)
                return
            self._push(greek=1)
            self.handle_data(c)
            self._pop(greek=1)


Second, you have to add the mapping for the unnamed entities. The best place
is directly before the mapping for the named entities, greeks = {
'alpha':'a', ...

# mapping of xml character entities to symbol encoding
symenc = {
    # greek letters
    913:'A', # Alpha
    914:'B', # Beta
    915:'G', # Gamma
    916:'D', # Delta
    917:'E', # Epsilon
    918:'Z', # Zeta
    919:'H', # Eta
    920:'Q', # Theta
    921:'I', # Iota
    922:'K', # Kappa
    923:'L', # Lambda
    924:'M', # Mu
    925:'N', # Nu
    926:'X', # Xi
    927:'O', # Omicron
    928:'P', # Pi
    929:'R', # Rho
    931:'S', # Sigma
    932:'T', # Tau
    933:'U', # Upsilon
    934:'F', # Phi
    935:'C', # Chi
    936:'Y', # Psi
    937:'W', # Omega
    945:'a', # alpha
    946:'b', # beta
    947:'g', # gamma
    948:'d', # delta
    949:'e', # epsilon
    950:'z', #zeta
    951:'h', # eta
    952:'q', # theta
    953:'i', # iota
    954:'k', # kappa
    955:'l', # lambda
    956:'m', # mu
    957:'n', # nu
    958:'x', # xi
    959:'o', # omicron
    960:'p', # pi
    961:'r', # rho
    962:'v', # sigmaf
    963:'s', # sigma
    964:'t', # tau
    965:'u', # upsilon
    966:'j', # phi
    967:'c', # chi
    968:'y', # psi
    969:'w', # omega
    977:'j', # thetasym
    978:'\241', # upsih
    982:'v', # piv
    # mathematical symbols
    8704:'"', # forall
    8706:'\266', # part
    8707:'$', # exist
    8709:'\306', # empty
    8711:'\321', # nabla
    8712:'\316', # isin
    8713:'\317', # notin
    8715:'\'', # ni
    8719:'\325', # prod
    8721:'\345', # sum
    8722:'-', # minus
    8727:'*', # lowast
    8730:'\326', # radic
    8733:'\265', # prop
    8734:'\245', # infin
    8736:'\320', # ang
    8869:'\331', # and
    8870:'\332', # or
    8745:'\307', # cap
    8746:'\310', # cup
    8747:'\362', # int
    8756:'\\', # there4
    8764:'~', # sim
    8773:'@', # cong
    8776:'\273', #asymp
    8800:'\271', # ne
    8801:'\272', # equiv
    8804:'\243', # le
    8805:'\263', # ge
    8834:'\314', # sub
    8835:'\311', # sup
    8836:'\313', # nsub
    8838:'\315', # sube
    8839:'\312', # supe
    8853:'\305', # oplus
    8855:'\304', # otimes
    8869:'^', # perp
    8901:'\327', # sdot
    9674:'\340', # loz
    # technical symbols
    8968:'\351', # lceil
    8969:'\371', # rceil
    8970:'\353', # lfloor
    8971:'\373', # rfloor
    9001:'\341', # lang
    9002:'\361', # rang
    # arrow symbols
    8592:'\254', # larr
    8593:'\255', # uarr
    8594:'\256', # rarr
    8595:'\257', # darr
    8596:'\253', # harr
    8656:'\334', # lArr
    8657:'\335', # uArr
    8658:'\336', # rArr
    8659:'\337', # dArr
    8660:'\333', # hArr
    # divers symbols
    8226:'\267', # bull
    8230:'\274', # hellip
    8242:'\242', # prime
    8254:'`', # oline
    8260:'\244', # frasl
    8472:'\303', # weierp
    8465:'\301', # image
    8476:'\302', # real
    8482:'\344', # trade
    8364:'\240', # euro
    8501:'\300', # alefsym
    9824:'\252', # spades
    9827:'\247', # clubs
    9829:'\251', # hearts
    9830:'\250' # diams
}

After making these two additions in platypus/paraparser.py, you can print
all kinds of mathematical symbols and greek letters using unnamed character
entities such as &#946; or &#8747;. The great advantage is that the same
information can be displayed in any web browser and in Reportlab PDFs
without any re-encoding. It would be great if this patch could be included
in future Reportlab versions.

Christoph Zwerschke
University of Heidelberg, Germany
(zwerschke at zuv.uni-heidelberg.de)