[Scons-dev] SCons performance investigations

Fri Jul 21 11:39:41 EDT 2017

Hi scons-dev -

The following is a revised draft of an email that I had originally intended
to send as a follow up to
https://pairlist4.pair.net/pipermail/scons-users/2017-June/006018.html.
Instead, Bill Deegan and I took some time to expand on my first draft and
add some ideas about how to address some of th e issues. We hope to migrate
this to the wiki, but wanted to share it here first for feedback.

----

Performance is one of the major challenges facing SCons. When compared with
other current options, particularly Ninja, in many cases performance can
lag significantly. That said other options by and large lack the
extensibility and many features of SCons.

Bill Deegan (SCons project co-manager) and I have been working together to
understand some of the issues that lead to poor SCons performance in a real
world (and fairly modestly sized) C++ codebase. Here is a summary of some
of our findings:

   -

   Python code usage: There are many places in the codebase where while the
   code is correct, performance based on cpython’s implementation can be
   improved by minor changes.
   -

      Examples
      -

         Using for loops and hashes to uniquify a list. Simple change in
         Node class yielded approximately 15% speedup for null build
         -

         Using if x.find(‘some character’) >=0 instead of is ‘some
         character’ in x (timeit benchmark shows a 10x speed difference)
         -

      Method to address
      -

         Profile the code looking for hotspots with cprofile and
         line_profiler. Then look for best implementations of code.
(Use timeit if
         useful to compare implementations. There are examples of such
in the bench
         dir (see:
         https://bitbucket.org/scons/scons/src/68a8afebafbefcf88217e9e778c1845db4f81823/bench/?at=default
         )
         -

   Serial DAG traversal: SCons walks the DAG to find out of date targets in
   a serial fashion. Once it finds them, it farms the work out to other
   threads, but the DAG walk remains serial. Given the proliferation of
   multicore machines since SCons’ initial implementation, a parallel walk of
   the DAG would yield significant speedup. Likely this would require
   implementation using the multiprocessing python library (instead of
   threads), since the GIL would block real parallelism otherwise. Packages
   like Boost where there are many header files can cause large increases in
   the size of the DAG, exacerbating this issue. There are two serious
   consequences of the slow DAG walk:
   -

      Incremental rebuilds in large projects. Typical developer workflow is
      to edit a file, rebuild, test. In our modestly sized codebase, we see the
      incremental time to do an ‘all’ rebuild for a one file change can reach
      well over a minute. This time is completely dominated by the serial
      dependency walk.
      -

      Inability to saturate distributed build clusters. In a
      distcc/icecream build, the serial DAG walk is slow enough that not enough
      jobs can be farmed out in parallel to saturate even a modest (400 cpu)
      build cluster. In our example, using ninja to drive a distributed full
      build results in an approximately 15x speedup, but SCons can
only achieve a
      2x speedup.
      -

      Method to address:
      -

         Investigate changing tree walk to generator
         -

         Investigate implementing tree walk using multiprocessing library
         -

   The dependency graph is the python object graph: The target dependency
   DAG is modeled via python Node Object to Node Object linkages (e.g. a list
   of child nodes held in a node). As a result, the only way to determine
   up-to-date-ness is by deeply nested method calls that repeatedly traverse
   the Python object graph. An attempt is made to mitigate this by memoizing
   state at the leaves (e.g. to cache the result of stat calls), but this
   still results in a large number of python function invocations for even the
   simplest state checks, where a result is already known. Similarly, the lack
   of global visibility precludes using externally provided change information
   to bypass scans.
   -

      See above re generator
      -

      Investigate modeling state separately from the python Node graph via
      some sort of centralized scoreboarding mechanism, it seems
likely that both
      the function call overhead could be eliminated and that local knowledge
      could be propagated globally more effectively.
      -

   CacheDir: There are some issues listed below. End-to-end caching
   functionality of SCons, including generated files, object files, shared
   libraries, whole executables, etc., is one of its great strengths, but its
   performance has much room for improvement.
   -

      Existing bug(s) when combining CacheDir with MD5-Timestamp devalues
      CacheDir.
      -

         Bug: http://scons.tigris.org/issues/show_bug.cgi?id=2980
         -

      Performance issues:
      -

         CacheDir re-creates signature data when extracting nodes from the
         Cache, even though it could have recorded the signature when
entering the
         objects into the cache.
         -

      Method to address
      -

         Store signatures for items in cachedir and then use them directly
         when copying items from Cache.
         -

         Fix the CacheDir / MD5-Timestamp integration bug
         -

   SConsign generation: The generation of the SConsign file is monolithic,
   not incremental. This means that if only one object file changed, the
   entire database needs to be re-written. It also appears that the mechanism
   used to serialize it is itself slow. Moving to a faster serialization model
   would be good, but even better would be to move to a faster serialization
   model that also admitted incremental updates to single items.
   -

      Method to address:
      -

         Replace sconsign with something faster than the current
         implementation, which is based on Pickle.
         -

         And/or Improve sconsign with something which can incrementally
         only write that which has changed.
         -

   Configure check performance: Even cached Configure checks seems slow,
   and for a complexly configured build this can add significant startup cost.
   Improvements here would be useful.
   -

      Method to address:
      -

         Code inspection, look for improvements
         -

         Profile
         -

   Variable Substitution: Currently variable substitution, which is largely
   used to create the command lines run by SCons, uses an appreciable
   percentage (approximately 18% for a null incremental build) of SCons’ CPU
   runtime. By and large much of this evaluation is duplicate (and thus
   avoidable work). For the moderate sized build discussed above there are
   approximately 100k calls to evaluation substitutions. There are only 413
   unique strings to be evaluated. Consider that the CXXCOM variable is
   expanded 2412 times for this build. The only variables which are guaranteed
   unique are the SOURCES and TARGETS, all others could be evaluated once and
   cached.
   -

      Prior work on this item:
      -

         https://bitbucket.org/scons/scons/wiki/SubstQuoteEscapeCache/Discussion
         -

      Working doc on current and areas for improvement:
      -

         https://bitbucket.org/scons/scons/wiki/SubstQuoteEscapeCache/SubstImprovement2017
         -

      Method to address:
      -

         Consider pre-evaluating Environment() variables where reasonable.
         This could use some sort of copy-on-write between cloned
Environments. This
         pre-evaluation would skip known target specific variables
         (TARGET,SOURCES,CHANGED_SOURCES, and a few others), so
minimally the per
         command line substitution should be faster.

Bill and I would appreciate any feedback or thoughts on the above items, or
suggestions for other areas to investigate. We are hoping that by
addressing some or all of these items, the runtime overhead of SCons could
be brought down significantly enough to re-render it competitive with other
build systems. We hope to begin work on the above items once SCons 3.0 has
shipped.

Thanks,
Andrew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist2.pair.net/pipermail/scons-dev/attachments/20170721/e88d647d/attachment-0001.html>