[reportlab-users] A riddle...

Dinu Gherman reportlab-users@reportlab.com
Fri, 19 Sep 2003 19:04:16 +0200


Andy Robinson:

> Interesting news, Dinu...thanks.  Is this a bug report?
> I feel like I have missed the start of the story.

The start of the story was that I wanted to find out if there are some
unexpected differences between different RLTK versions on the files it
generates itself. So I invested one day of work to find out. But no, it
is no bug report (yet). You should probably care more about potential
diffs than I do. ;-)

> This sounds cool.  We're moving to new offices and setting
> up a bigger 'test workbench' in October and we shold try to
> do this ourselves.  Can  you enlighten us what is doing
> the rasterizing?  is it a Mac-OS X specific thing?

Yes. There, PDF is a built-in "datatype". But I guess I could swap
this out and use GhostScript, too. Have to play with that...

> Well, we aren't going to backport distutils to old
> versions.  I still think our setup script only
> half works :-)

I know that very well! ;-) Even 1.17 has its gotchas...

> Here we are working on something very important.  Currently,
> if you run the same test script multiple times, you get
> different PDFs.  This is because PDF is supposed to contain
> unique document IDs, and we escape these so that 16 effectively
> random bytes in the ID can be a varying length escaped string.
> Also, you get different output between Python versions and
> expecially between CPython/Jython, because we have used
> things like objects' addresses i memory as comments and
> because we use str() to format numbers in the PDF file.
>
> We are currently debugging/finishing an 'invariant mode'. This
> means that when you doc Canvas(..., invariant=1) you should
> get totally repeatable results.  This makes regression
> testing possible even without rasterizing PDF files.

Well, in the end it's what a PDF looks like on screen and on paper
which is important, not so much the PDF code itself. There is an in-
finite number of possible files which all produce documents looking
exactly the same.

When inspecting the PDF code you are interested in maintaining a
stable PDF production process while varying many configuration para-
meters of your enviroment, as you mentioned. On the other hand, when
inspecting rastered documents (their look) you are interested in the
evolution of your software, i.e. in adding features without affecting
the "basement". In some sense, the former can be seen as the ISO-9000
view while the latter is XP. ;-)

> The only sane way to test all of this is to run the test suite
> and compare all PDFs produced byte-for-byte with CPython.
> Hence "invariant mode".

It is easy to imagine many new features you might want to add to the
RLTK like linearization (sic!), say, which will significantly change
the PDF code without making any difference in the document's look!
So I'd claim that comparing rasterized PDFs has it applications.

> What has driven this, which may excite a few people, is that
> we are trying to have reportlab "Jython-certified".  Close
> CVS watchers may have seen a few
>   'if sys.platform[0:4] == "java"'
> lines creeping in.    We're making sure it can use java.awt.image
> or PIL depending on the platform, and writing _rl_accel.java
> whichw e will check in when it produces identical results.
> This should result in a ReportLab Toolkit that "just works" on
> Java with reasonable performance.

If I remember correctly, it was actually me who added the first row
of such tests after investigating Java compliance and shortly before
1.14 came out (my local Jython email archives don't contain that any-
more):

   http://aspn.activestate.com/ASPN/Mail/Message/1217309

Regards,

Dinu

--
Dinu C. Gherman
......................................................................
"I want to put a ding in the universe." (Steve Jobs)