[reportlab-users] Platypus XML markup suggestion
Christoph Zwerschke
reportlab-users@reportlab.com
Thu, 22 May 2003 16:57:04 +0200
This is my first posting in this list, and I want to first express thanks
for this great and useful tool. I am using it to dynamically generate
documents from a web application based on Webware. More concrete, I produce
PDF sheets with guidelines for handling hazardous materials from a database
with hazardous substances.
Here is suggestion for improvement of the Platypus intraparagraph markup.
In the current version it is possible to print a German Umlaut a (ä) by
using the unnamed entity ä, but not by using the named entity ä -
on the contrary it is possible to print a Greek letter alpha by using the
named entity α but not using the euivalent unnamed entity α. This
is a quite inconsistent behaviour. In my database are substance names with
German Umlauts and Greek letters. I generally store them as unnamed entities
because some web browsers do not support all named entities or spell them
differently. I then have the problem that greek letters and symbols such as
"lower equal" (for instance used in the specification of concentrations in
percent) are not visible in the Reportlab produced PDF documents.
Here is a solution wich will solve the general problem and add unnamed
entities for the symbol font to the platypus/paraparser.py module.
You need to insert code at two places. First, add a handle_charref handler
for the unnamed entities. The best place is directly before the definition
of handle_entityref starting with the line #### greek script:
#### add symbol encoding
def handle_charref(self, name):
try:
if name[0] == 'x':
n = string.atoi(name[1:], 16)
else:
n = string.atoi(name)
except string.atoi_error:
self.unknown_charref(name)
return
if 0 <= n <= 255:
self.handle_data(chr(n))
else:
try:
c = symenc[n]
except KeyError:
self.unknown_charref(name)
return
self._push(greek=1)
self.handle_data(c)
self._pop(greek=1)
Second, you have to add the mapping for the unnamed entities. The best place
is directly before the mapping for the named entities, greeks = {
'alpha':'a', ...
# mapping of xml character entities to symbol encoding
symenc = {
# greek letters
913:'A', # Alpha
914:'B', # Beta
915:'G', # Gamma
916:'D', # Delta
917:'E', # Epsilon
918:'Z', # Zeta
919:'H', # Eta
920:'Q', # Theta
921:'I', # Iota
922:'K', # Kappa
923:'L', # Lambda
924:'M', # Mu
925:'N', # Nu
926:'X', # Xi
927:'O', # Omicron
928:'P', # Pi
929:'R', # Rho
931:'S', # Sigma
932:'T', # Tau
933:'U', # Upsilon
934:'F', # Phi
935:'C', # Chi
936:'Y', # Psi
937:'W', # Omega
945:'a', # alpha
946:'b', # beta
947:'g', # gamma
948:'d', # delta
949:'e', # epsilon
950:'z', #zeta
951:'h', # eta
952:'q', # theta
953:'i', # iota
954:'k', # kappa
955:'l', # lambda
956:'m', # mu
957:'n', # nu
958:'x', # xi
959:'o', # omicron
960:'p', # pi
961:'r', # rho
962:'v', # sigmaf
963:'s', # sigma
964:'t', # tau
965:'u', # upsilon
966:'j', # phi
967:'c', # chi
968:'y', # psi
969:'w', # omega
977:'j', # thetasym
978:'\241', # upsih
982:'v', # piv
# mathematical symbols
8704:'"', # forall
8706:'\266', # part
8707:'$', # exist
8709:'\306', # empty
8711:'\321', # nabla
8712:'\316', # isin
8713:'\317', # notin
8715:'\'', # ni
8719:'\325', # prod
8721:'\345', # sum
8722:'-', # minus
8727:'*', # lowast
8730:'\326', # radic
8733:'\265', # prop
8734:'\245', # infin
8736:'\320', # ang
8869:'\331', # and
8870:'\332', # or
8745:'\307', # cap
8746:'\310', # cup
8747:'\362', # int
8756:'\\', # there4
8764:'~', # sim
8773:'@', # cong
8776:'\273', #asymp
8800:'\271', # ne
8801:'\272', # equiv
8804:'\243', # le
8805:'\263', # ge
8834:'\314', # sub
8835:'\311', # sup
8836:'\313', # nsub
8838:'\315', # sube
8839:'\312', # supe
8853:'\305', # oplus
8855:'\304', # otimes
8869:'^', # perp
8901:'\327', # sdot
9674:'\340', # loz
# technical symbols
8968:'\351', # lceil
8969:'\371', # rceil
8970:'\353', # lfloor
8971:'\373', # rfloor
9001:'\341', # lang
9002:'\361', # rang
# arrow symbols
8592:'\254', # larr
8593:'\255', # uarr
8594:'\256', # rarr
8595:'\257', # darr
8596:'\253', # harr
8656:'\334', # lArr
8657:'\335', # uArr
8658:'\336', # rArr
8659:'\337', # dArr
8660:'\333', # hArr
# divers symbols
8226:'\267', # bull
8230:'\274', # hellip
8242:'\242', # prime
8254:'`', # oline
8260:'\244', # frasl
8472:'\303', # weierp
8465:'\301', # image
8476:'\302', # real
8482:'\344', # trade
8364:'\240', # euro
8501:'\300', # alefsym
9824:'\252', # spades
9827:'\247', # clubs
9829:'\251', # hearts
9830:'\250' # diams
}
After making these two additions in platypus/paraparser.py, you can print
all kinds of mathematical symbols and greek letters using unnamed character
entities such as β or ∫. The great advantage is that the same
information can be displayed in any web browser and in Reportlab PDFs
without any re-encoding. It would be great if this patch could be included
in future Reportlab versions.
Christoph Zwerschke
University of Heidelberg, Germany
(zwerschke at zuv.uni-heidelberg.de)