parallel autotest [0/3]

autoconf-patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

parallel autotest [0/3]

From:	Ralf Wildenhues
Subject:	parallel autotest [0/3]
Date:	Mon, 26 May 2008 07:47:43 +0200
User-agent:	Mutt/1.5.17+20080114 (2008-01-14)

There are many possible ways to parallelize Autotest testsuite
execution, for example:

a) have each test group be a 'make' target in a makefile, then use
parallel make.
b) leverage the job server from GNU make,
c) reimplement in shell a simplified job server a la parallel GNU make,
d) implement a "worker thread" parallelization in shell.

Features and differences of these approaches:

- (a) and (b) need GNU make for parallel execution,
- AFAICS (b) currently needs a shell that understands 'read -n1',
- (d) differs from (c) in that one subprocess executes more than one
  test group, thus is potentially faster because it forks less,
- (b) has the nice feature that it allows to parallelize across multiple
  test suites, and across testing and other, independent build activity.
  That means, while (c) and (d) allow
      make check TESTSUITEFLAGS='-j3'
  to speed up things, (b) allows
      gmake -j3 check
  to profit.

I've experimented a bit with these approaches.  I did not see an easy
way to get (a) to work under the restrictions that it may not start the
complete testsuite anew for each job: this has both very high overhead,
and/or it requires that user-provided startup bits like atlocal be
idempotent.  I must confess that I didn't try very hard, though.

This patch series consists of
1) a preparatory patch refactoring the driver loop,
2) an experimental patch implementing (c) if the system has mkfifo,
3) an *experimental* patch supporting (b) if GNU make is used, and the
   shell can 'read -n1'.

The patches (2) and (3) currently both have
- an unknown number of remaining race conditions,  ;-)
- known file descriptor leaks to test grous (not sure whether to view
  that as a problem or not),
- the bug^Wlimitation that, when a parallel run is interrupted, the
  currently running test groups may still finish (see below also),
- tests that are probably still too strict.

Right now, the only system where I had significant problems was Cygwin
with its seemingly limited named fifo emulation.  I expect that, given
sufficient interest, somebody will fix this for me.  ;-)
(Of course, (c) doesn't work with MinGW, as it has no named fifos.)

Also, I am reluctant to apply (3) before asking for permission on the
GNU make mailing list.

I guess I should comment on (d):
Tried out an implementation where an empty directory is created for each
test group.  Each worker thread tries to rmdir the next available dir,
and if it succeeds, runs that test group, and writes out the results in
a (different) per-test-group directory.  This works, doesn't cause too
much "lock contention" on practical loads, but could do so in theory.
However, it is a bit tough on the file system, 'make check' on tmpfs
being noticeably faster than on NFS.  (Probably the latter holds for 
the upcoming patches too, hopefully not as bad though; I haven't
measured).

I haven't tried this alternative implementation for (d) yet:
When ready for another test group, a worker thread p writes one byte
which encodes $p, to pipe-M, then waits reading the number of the next
test group from the per-worker pipe-$p.  This limits parallelism to
250odd workers, as pipes do not provide for atomic transfer of multiple
bytes.  I even wonder whether we may count on data not being dropped
with multiple writers to a pipe (this may be a reason for the Cygwin
issues mentioned above).

A comment on interrupting execution:
My incomplete testing and exploration of interrupting the testsuite has
not shown up a decent and portable way yet.  Some shells seem to behave
sanely, with others I am wondering whether I have merely assumed too
much bash-like semantics, or it's really impossible to do right.
Non-unix systems are a completely different story, too.  Anyway it would
be good to have this fixed before parallel autotest is a non-experimental
feature.

Comments, reports etc. appreciated.

The GNU make job server is described in
<http://make.mad-scientist.us/jobserver.html>.

Cheers,
Ralf

[Prev in Thread]

Current Thread

[Next in Thread]

parallel autotest [0/3], Ralf Wildenhues <=
- parallel autotest [1/3]: Refactor testsuite driver loop., Ralf Wildenhues, 2008/05/26
  - Re: parallel autotest [1/3]: Refactor testsuite driver loop., Eric Blake, 2008/05/29
    - Re: parallel autotest [1/3]: Refactor testsuite driver loop., Ralf Wildenhues, 2008/05/29
    - Re: parallel autotest [1/3]: Refactor testsuite driver loop., Eric Blake, 2008/05/29
- parallel autotest [2/3]: Implement 'testsuite --jobs'., Ralf Wildenhues, 2008/05/26
- parallel autotest [3/3]: GNU make jobserver client., Ralf Wildenhues, 2008/05/26
- Re: parallel autotest [0/3], Eric Blake, 2008/05/29
  - Re: parallel autotest [0/3], Ralf Wildenhues, 2008/05/29
    - Re: parallel autotest [0/3], Eric Blake, 2008/05/29

Prev by Date: Re: fix AC_C_CONST to work with CFLAGS -O2 -Wall -Werror
Next by Date: parallel autotest [1/3]: Refactor testsuite driver loop.
Previous by thread: more shell portability docs
Next by thread: parallel autotest [1/3]: Refactor testsuite driver loop.
Index(es):
- Date
- Thread