[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug#8788: Weird testsuite failure on NetBSD (parallel tests, backgro

From: Stefano Lattarini
Subject: Re: bug#8788: Weird testsuite failure on NetBSD (parallel tests, background processes)
Date: Tue, 18 Oct 2011 23:16:25 +0200
User-agent: KMail/1.13.5 (FreeBSD/8.2-RELEASE; KDE/4.5.5; i386; ; )


[Adding bug-autoconf in CC]

On Thursday 02 June 2011, Stefano Lattarini wrote:
> Hello automakers.
> While teststing the `testsuite-work' branch on NetBSD 5, I've encountered
> a weird failure in the test `parallel-tests3.test', which actually caused
> the whole testsuite to crash (!) due to a stray SIGTERM.
> [SNIP]
> Any idea of what's going on?
Ah ah, got it! (I think).  The failure is due to an interaction between some
features of GNU make and some (mis)features the NetBSD Korn Shell.  Let's see
the details.

[1] The Korn shell gets selected to run the Makefile recipes

On NetBSD, an autoconf-generated configure script will select /bin/ksh as
the $(SHELL) used to execute the Makefile recipes:
  $ grep 'SHELL.*=' tests/parallel-tests3.dir/*/config.log

[2] The Korn shell has quirks w.r.t. signal handling

The NetBSD's Korn Shell is one of those shells which try to "propagate"
terminating signals, as explained in the ``Signal Handling'' node of the
(as of today yet unreleased) bleeding-edge autoconf manual; see also these
relevant links:


And in fact, NetBSD's Korn Shell even seems to propagate a fatal signal
it has received *to all its process group*!  Let's see a few examples:

 $ /bin/sh -c '/bin/sh -c "kill -15 \$\$"; echo alive'
 [1]   Terminated              /bin/sh -c "kill...

 $ /bin/ksh -c '/bin/sh -c "kill -15 \$\$"; echo alive'

 # ksh apparently terminate its parent
 $ /bin/sh -c '/bin/ksh -c "kill -15 \$\$"; echo alive'

 $ /bin/ksh -c '/bin/ksh -c "kill -15 \$\$"; echo alive'

Just to be sure, let's try to trace the systems calls made by the Korn

  $ ktrace /bin/sh -c '
  > echo parent: $$
  > ktrace -a /bin/ksh -c "echo child: \$\$; kill -15 \$\$"
  > echo alive
  parent: 20429
  child: 4829

  $ kdump ktrace.out | grep -i sig | grep -v __sig
   4829  1 ksh  CALL  kill(0x12dd, SIGTERM)
   4829  1 ksh  PSIG  SIGTERM caught handler=0x420810 mask=(): code=SI_USER 
sent by pid=4829, uid=1242)
   4829  1 ksh  CALL  kill(0, SIGTERM)
   4829  1 ksh  PSIG  SIGTERM SIG_DFL: code=SI_USER sent by pid=4829, uid=1242)
  20429  1 sh   PSIG  SIGTERM SIG_DFL: code=SI_USER sent by pid=4829, uid=1242)

(Note that `0x12dd' is decimal 4829).

[3] GNU make propagates signal to the running recipes

If GNU make receives a terminating signal while it's updating some target(s), it
propagates that signal to the currently-executing recipe(s):

  $ cat Makefile 
  all: 1 2
  1 2:
       @trap 'echo got SIGTERM; exit 77' 15; while :; do :; done
  $ gmake -j2 &
  [1] 5980
  $ kill $!
  gmake: *** [2] Error 77
  gmake: *** [1] Error 77

(FWIW, I find this to be an helpful and rational behaviour).

[4] Putting it all together

So here is my diagnosis of what happens when `parallel-tests3.test' is
run on NetBSD with GNU make:

 1) various setup/preparation commands get executed in this script; the
    Korn shell gets selected to run the recipe of the Makefile;
 2) "make -j1 check" is launched in the background:
      cd serial
      $MAKE -j1 check &
 3) some more commands get run, and they concludes before the background
    make process launched in (2) has concluded;
 4) the shell executing `parallel-tests3.test' explicitly kills the still
    running background "make" process  with a SIGTERM:
      cd ..
      kill $!
 5) GNU make "relays" the SIGTERM to the korn shell executing the still
    running recipe(s);
 6) in turn, the korn shell relays the SIGTERM to all processes in its
    process group;
 7) this includes the top-level make process that is running the automake
    testsuite (if any); which explains the crash that is the object of
    this bug report.

I'm not 100% positive that point (7) is completely correct, but I'm running
out of time now, so I'll settle for this explanation; kudos to anyone who
can give some confirmation about the correctness of point (7)!


Now, the right fix for the bug is *not* to work around this behaviour
of the Korn shell; rather, we should fix the suspicious logic of the
`parallel-tests3.test' script, which was also causing a testsuite hanging
on FreeBSD.  Patch coming up shortly.

And it goes without saying that this horrendous NetBSD's Korn Shell
incompatibility should be documented in the autoconf manual; I will
maybe give it a shot in the next days if nobody beats me.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]