[Scons-dev] SCons performance investigations

Daniel Holth dholth at gmail.com
Fri Jul 21 14:51:54 EDT 2017


At least in pypy a json sconsign would be quite fast compared to pickle.

On Fri, Jul 21, 2017, 11:39 Andrew C. Morrow <andrew.c.morrow at gmail.com>
wrote:

>
> Hi scons-dev -
>
> The following is a revised draft of an email that I had originally
> intended to send as a follow up to
> https://pairlist4.pair.net/pipermail/scons-users/2017-June/006018.html.
> Instead, Bill Deegan and I took some time to expand on my first draft and
> add some ideas about how to address some of th e issues. We hope to migrate
> this to the wiki, but wanted to share it here first for feedback.
>
> ----
>
> Performance is one of the major challenges facing SCons. When compared
> with other current options, particularly Ninja, in many cases performance
> can lag significantly. That said other options by and large lack the
> extensibility and many features of SCons.
>
> Bill Deegan (SCons project co-manager) and I have been working together to
> understand some of the issues that lead to poor SCons performance in a real
> world (and fairly modestly sized) C++ codebase. Here is a summary of some
> of our findings:
>
>
>    -
>
>    Python code usage: There are many places in the codebase where while
>    the code is correct, performance based on cpython’s implementation can be
>    improved by minor changes.
>    -
>
>       Examples
>       -
>
>          Using for loops and hashes to uniquify a list. Simple change in
>          Node class yielded approximately 15% speedup for null build
>          -
>
>          Using if x.find(‘some character’) >=0 instead of is ‘some
>          character’ in x (timeit benchmark shows a 10x speed difference)
>          -
>
>       Method to address
>       -
>
>          Profile the code looking for hotspots with cprofile and
>          line_profiler. Then look for best implementations of code. (Use timeit if
>          useful to compare implementations. There are examples of such in the bench
>          dir (see:
>          https://bitbucket.org/scons/scons/src/68a8afebafbefcf88217e9e778c1845db4f81823/bench/?at=default
>          )
>          -
>
>    Serial DAG traversal: SCons walks the DAG to find out of date targets
>    in a serial fashion. Once it finds them, it farms the work out to other
>    threads, but the DAG walk remains serial. Given the proliferation of
>    multicore machines since SCons’ initial implementation, a parallel walk of
>    the DAG would yield significant speedup. Likely this would require
>    implementation using the multiprocessing python library (instead of
>    threads), since the GIL would block real parallelism otherwise. Packages
>    like Boost where there are many header files can cause large increases in
>    the size of the DAG, exacerbating this issue. There are two serious
>    consequences of the slow DAG walk:
>    -
>
>       Incremental rebuilds in large projects. Typical developer workflow
>       is to edit a file, rebuild, test. In our modestly sized codebase, we see
>       the incremental time to do an ‘all’ rebuild for a one file change can reach
>       well over a minute. This time is completely dominated by the serial
>       dependency walk.
>       -
>
>       Inability to saturate distributed build clusters. In a
>       distcc/icecream build, the serial DAG walk is slow enough that not enough
>       jobs can be farmed out in parallel to saturate even a modest (400 cpu)
>       build cluster. In our example, using ninja to drive a distributed full
>       build results in an approximately 15x speedup, but SCons can only achieve a
>       2x speedup.
>       -
>
>       Method to address:
>       -
>
>          Investigate changing tree walk to generator
>          -
>
>          Investigate implementing tree walk using multiprocessing library
>          -
>
>    The dependency graph is the python object graph: The target dependency
>    DAG is modeled via python Node Object to Node Object linkages (e.g. a list
>    of child nodes held in a node). As a result, the only way to determine
>    up-to-date-ness is by deeply nested method calls that repeatedly traverse
>    the Python object graph. An attempt is made to mitigate this by memoizing
>    state at the leaves (e.g. to cache the result of stat calls), but this
>    still results in a large number of python function invocations for even the
>    simplest state checks, where a result is already known. Similarly, the lack
>    of global visibility precludes using externally provided change information
>    to bypass scans.
>    -
>
>       See above re generator
>       -
>
>       Investigate modeling state separately from the python Node graph
>       via some sort of centralized scoreboarding mechanism, it seems likely that
>       both the function call overhead could be eliminated and that local
>       knowledge could be propagated globally more effectively.
>       -
>
>    CacheDir: There are some issues listed below. End-to-end caching
>    functionality of SCons, including generated files, object files, shared
>    libraries, whole executables, etc., is one of its great strengths, but its
>    performance has much room for improvement.
>    -
>
>       Existing bug(s) when combining CacheDir with MD5-Timestamp devalues
>       CacheDir.
>       -
>
>          Bug: http://scons.tigris.org/issues/show_bug.cgi?id=2980
>          -
>
>       Performance issues:
>       -
>
>          CacheDir re-creates signature data when extracting nodes from
>          the Cache, even though it could have recorded the signature when entering
>          the objects into the cache.
>          -
>
>       Method to address
>       -
>
>          Store signatures for items in cachedir and then use them
>          directly when copying items from Cache.
>          -
>
>          Fix the CacheDir / MD5-Timestamp integration bug
>          -
>
>    SConsign generation: The generation of the SConsign file is
>    monolithic, not incremental. This means that if only one object file
>    changed, the entire database needs to be re-written. It also appears that
>    the mechanism used to serialize it is itself slow. Moving to a faster
>    serialization model would be good, but even better would be to move to a
>    faster serialization model that also admitted incremental updates to single
>    items.
>    -
>
>       Method to address:
>       -
>
>          Replace sconsign with something faster than the current
>          implementation, which is based on Pickle.
>          -
>
>          And/or Improve sconsign with something which can incrementally
>          only write that which has changed.
>          -
>
>    Configure check performance: Even cached Configure checks seems slow,
>    and for a complexly configured build this can add significant startup cost.
>    Improvements here would be useful.
>    -
>
>       Method to address:
>       -
>
>          Code inspection, look for improvements
>          -
>
>          Profile
>          -
>
>    Variable Substitution: Currently variable substitution, which is
>    largely used to create the command lines run by SCons, uses an appreciable
>    percentage (approximately 18% for a null incremental build) of SCons’ CPU
>    runtime. By and large much of this evaluation is duplicate (and thus
>    avoidable work). For the moderate sized build discussed above there are
>    approximately 100k calls to evaluation substitutions. There are only 413
>    unique strings to be evaluated. Consider that the CXXCOM variable is
>    expanded 2412 times for this build. The only variables which are guaranteed
>    unique are the SOURCES and TARGETS, all others could be evaluated once and
>    cached.
>    -
>
>       Prior work on this item:
>       -
>
>
>          https://bitbucket.org/scons/scons/wiki/SubstQuoteEscapeCache/Discussion
>          -
>
>       Working doc on current and areas for improvement:
>       -
>
>
>          https://bitbucket.org/scons/scons/wiki/SubstQuoteEscapeCache/SubstImprovement2017
>          -
>
>       Method to address:
>       -
>
>          Consider pre-evaluating Environment() variables where
>          reasonable. This could use some sort of copy-on-write between cloned
>          Environments. This pre-evaluation would skip known target specific
>          variables (TARGET,SOURCES,CHANGED_SOURCES, and a few others), so minimally
>          the per command line substitution should be faster.
>
>
> Bill and I would appreciate any feedback or thoughts on the above items,
> or suggestions for other areas to investigate. We are hoping that by
> addressing some or all of these items, the runtime overhead of SCons could
> be brought down significantly enough to re-render it competitive with other
> build systems. We hope to begin work on the above items once SCons 3.0 has
> shipped.
>
> Thanks,
> Andrew
>
>
> _______________________________________________
> Scons-dev mailing list
> Scons-dev at scons.org
> https://pairlist2.pair.net/mailman/listinfo/scons-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist2.pair.net/pipermail/scons-dev/attachments/20170721/4c5b7390/attachment-0001.html>


More information about the Scons-dev mailing list