[reportlab-users] Experimental early serializing pdfdoc.py for

Fri Apr 15 11:24:51 EDT 2005

Hi,

 > I don't really mind who cleans up the patch. It's almost workable now,
 > but needs some refinement. Is the way to do this to make a branch and
 > allow you write access?

   I'm going through it at the moment. I believe since i'm familiar with 
what i did, it is best when i drop at least the _pdfdata thingy myself.

   A branch sounds good, and having write access to that branch at least 
temporarily can be useful.

> There are a number of issues related to compatibility. First off we need 
> to know how to deal with multiple pass documents and the various 
> filetypes. I think it's simplistic to assume we always have true file 
> like objects; writing to a socket doesn't allow tell and truncate etc etc.

   The PDFFile class is very simple, one can imagine an overloadable 
IPDFFile interface. The filename argument to Canvas and PDFDocument then 
can be:
   an IPDFFile which becomes the doc.File directly
   a string or any object having a "write" method which would 
instanciate a PDFFile as usual.

> I would like to ask whether the early serialized version with a StringIO 
> file is equivalent in speed/memory usage to the existing late 
> serializing version. 

   I'll check that when i do the tests with the cleaned version. As is, 
the early serializer already uses StringIO when the filename is not 
given or None.

> Can we simulate the late serializer with the early 
> one + a special file (ie a list) and some extra dictionaries etc etc? 
> That would be a best of both worlds approach.

   What i missed when i did my changes is to have a criterion when an 
object is 'finished': that is it has all the data needed for formatting. 
The early serializer is the same basically as the late one, it only does 
'educated' guesses about when objects can be serialized (that is it 
assumes it can serialize at a page break). One now only has to call 
.setForwardReference() on an object and it goes into the late 
serializer. Missing now (has to be done manually) is a method to clear 
that flag. Actually i planned in my cleanup to add a .finalize(document) 
method which would clear that flag and serialize the object.

> 
> As for an API I guess we'd need to allow this to be decided at run time; 
> the implication being that we'd need some kind of pdfdoc object. A 
> module can do as a start.

    It will be tricky to have the 2 versions cohabit peacefully given 
the existing base. There are many imports now for 
reportlab.pdfbase.pdfdoc and clearly the ES objects are not compatible 
with the LS ones. But i'm more than willing to make the ES pdfdoc.py a 
pdfdoc_es.py.

    Thomas Blatter