[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: parallel autotest [0/3]
Re: parallel autotest [0/3]
Thu, 29 May 2008 23:42:39 +0200
* Eric Blake wrote on Thu, May 29, 2008 at 02:54:37PM CEST:
> According to Ralf Wildenhues on 5/25/2008 11:47 PM:
> | There are many possible ways to parallelize Autotest testsuite
> | execution, for example:
> I'm still looking for time to review this series in depth, but appreciate
> the work you've put into it so far.
Take your time. This has been simmering here for several months,
there's no reason to rush it in.
> | a) have each test group be a 'make' target in a makefile, then use
> | parallel make.
> | b) leverage the job server from GNU make,
> | c) reimplement in shell a simplified job server a la parallel GNU make,
> | d) implement a "worker thread" parallelization in shell.
> | Features and differences of these approaches:
> | - (a) and (b) need GNU make for parallel execution,
I should clarify: (a) is likely to cope with parallel non-GNU make,
while (b) will ignore non-GNU make parallelism. For example, BSD makes
typically allow parallel execution as well, but do not implement a job
> | - AFAICS (b) currently needs a shell that understands 'read -n1',
> | - (d) differs from (c) in that one subprocess executes more than one
> | test group, thus is potentially faster because it forks less,
> | - (b) has the nice feature that it allows to parallelize across multiple
> | test suites, and across testing and other, independent build activity.
> | That means, while (c) and (d) allow
> | make check TESTSUITEFLAGS='-j3'
> | to speed up things, (b) allows
> | gmake -j3 check
> | to profit.
> Interesting trades. I like that (c) and (d) can do parallel execution
> when the testsuite is run manually (without make);
Good point. I would not want to have (b) without any of (c), (d);
rather, I thought of adding (c) and maybe also (b).
> on the other hand,
> since make is usually the driver, I would tend to favor a solution along
> the lines of (b) that lets the testsuite work alongside other processes.
Well, (b) kind of puts './testsuite' on par with 'gmake', as it then acts
as distributor of work just like parallel (GNU) make does.
> | I've experimented a bit with these approaches. I did not see an easy
> | way to get (a) to work under the restrictions that it may not start the
> | complete testsuite anew for each job: this has both very high overhead,
> | and/or it requires that user-provided startup bits like atlocal be
> | idempotent. I must confess that I didn't try very hard, though.
> The goal of still shipping a single 'testsuite' file that contains
> everything needed to create the multiple tests is nice. I agree that
> blindly calling 'testsuite n' for n parallel tests is too much overhead.
> On the other hand, it would sure be nice to call testsuite once with the
> user's TESTSUITEFLAGS to generate the subset of individual files to run,
> then turn around and run those individual files in parallel.
I don't quite understand what the last sentence in this paragraph is
supposed to mean.
In each of (b), (c), you can take any TESTSUITEFLAGS you would currently
use; with (c), add '-j3' to TESTSUITEFLAGS, with (b), add '-j3' to the
gmake command line directly.
> Is there a way to write a Makefile include fragment which gets included if
> we detect GNU make, but is portably ignored for other makes, where we can
> then exploit gmake features to make parallel execution easier?
As I understand this question, the at_parse_makeflags snippet shown in
patch 3/3 does exactly that: it should be a no-op for non-GNU parallel
make, as none of them use '--jobserver-fds='.
If your question is about a general way to include something for GNU
make only, one possibility is to have a GNUmakefile which includes
Makefile plus extra, GNU make-specific code.
> | The patches (2) and (3) currently both have
> | - an unknown number of remaining race conditions, ;-)
> | - known file descriptor leaks to test grous (not sure whether to view
> | that as a problem or not),
> | - the bug^Wlimitation that, when a parallel run is interrupted, the
> | currently running test groups may still finish (see below also),
> Autotest behavior when using ^C is already fishy (I often find that it
> makes a test report as OK rather than failed, because it kills the test
> group before any failure file could be created). In other words, work to
> improve signal handling within autotest is useful independently of this
> patch series.
Fully agreed; I've seen such fishy behavior, too, but mostly ignored it
up to now. I fear that the behavior depends quite a bit on the sh and
make implementations involved.
> | Right now, the only system where I had significant problems was Cygwin
> | with its seemingly limited named fifo emulation. I expect that, given
> | sufficient interest, somebody will fix this for me. ;-)
> Were you testing with cygwin 1.5.25 or the experimental cygwin 1.7.0? I
> agree that that the 1.5.x named fifos don't always work reliably. I'll
> certainly try to play more with this.
I think 1.5.25. My usual procedure is to boot w32, fire up cygwin
setup.exe, run its update process without changing any settings
manually, then do testing. I assume that, for experimental 1.7.0
I'd have to change some setting, no?
FYI, here's some quick timings on an 8-way GNU/Linux system for -jP,
Autoconf's testsuite, build tree kept on tmpfs, timings in seconds,
efficiency in percent of the sequential case:
P (b) eff (c) eff
1 435.5 100
2 224.2 97 224.5 97
4 118.5 92 118.6 92
7 78.8 79 78.9 79
8 73.7 74 73.7 74
Not very exciting. I think the drop-off in efficiency is mostly due to
the inefficiency in Autotest: because it's a shell script that forks a
lot, including many command pipes, and does a lot of I/O, many duties
that benefit from extra processors that can do them in parallel. Other
factors are of course inherently serial parts (Amdahl, the scheduling of
the job server process), and, to a smaller degree, the load imbalance of
the different tests.
Running with a build tree on disk or even NFS has a large impact on
overhead and parallel efficiency (roughly 35% over tmpfs in the
sequential case, and it gets a lot worse with more processes).