bug-autoconf
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug#8788: Weird testsuite failure on NetBSD (parallel tests, backgro


From: Stefano Lattarini
Subject: Re: bug#8788: Weird testsuite failure on NetBSD (parallel tests, background processes)
Date: Tue, 18 Oct 2011 23:16:25 +0200
User-agent: KMail/1.13.5 (FreeBSD/8.2-RELEASE; KDE/4.5.5; i386; ; )

Reference:
 <http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8788>

[Adding bug-autoconf in CC]

On Thursday 02 June 2011, Stefano Lattarini wrote:
> Hello automakers.
> 
> While teststing the `testsuite-work' branch on NetBSD 5, I've encountered
> a weird failure in the test `parallel-tests3.test', which actually caused
> the whole testsuite to crash (!) due to a stray SIGTERM.
> 
> [SNIP]
> 
> Any idea of what's going on?
> 
Ah ah, got it! (I think).  The failure is due to an interaction between some
features of GNU make and some (mis)features the NetBSD Korn Shell.  Let's see
the details.

[1] The Korn shell gets selected to run the Makefile recipes
-------------------------------------------------------------

On NetBSD, an autoconf-generated configure script will select /bin/ksh as
the $(SHELL) used to execute the Makefile recipes:
 
  $ grep 'SHELL.*=' tests/parallel-tests3.dir/*/config.log
  tests/parallel-tests3.dir/parallel/config.log:SHELL='/bin/ksh'
  tests/parallel-tests3.dir/serial/config.log:SHELL='/bin/ksh'

[2] The Korn shell has quirks w.r.t. signal handling
----------------------------------------------------

The NetBSD's Korn Shell is one of those shells which try to "propagate"
terminating signals, as explained in the ``Signal Handling'' node of the
(as of today yet unreleased) bleeding-edge autoconf manual; see also these
relevant links:

 <http://lists.gnu.org/archive/html/autoconf-patches/2011-09/msg00005.html>
 <https://lists.gnu.org/archive/html/bug-autoconf/2011-09/msg00004.html>
 
<http://mail.opensolaris.org/pipermail/ksh93-integration-discuss/2009-February/004121.html>

And in fact, NetBSD's Korn Shell even seems to propagate a fatal signal
it has received *to all its process group*!  Let's see a few examples:

 $ /bin/sh -c '/bin/sh -c "kill -15 \$\$"; echo alive'
 [1]   Terminated              /bin/sh -c "kill...
 alive

 $ /bin/ksh -c '/bin/sh -c "kill -15 \$\$"; echo alive'
 Terminated 
 alive

 # ksh apparently terminate its parent
 $ /bin/sh -c '/bin/ksh -c "kill -15 \$\$"; echo alive'
 Terminated

 $ /bin/ksh -c '/bin/ksh -c "kill -15 \$\$"; echo alive'
 Terminated 
 Terminated

Just to be sure, let's try to trace the systems calls made by the Korn
shell:

  $ ktrace /bin/sh -c '
  > echo parent: $$
  > ktrace -a /bin/ksh -c "echo child: \$\$; kill -15 \$\$"
  > echo alive
  '
  parent: 20429
  child: 4829
  Terminated

  $ kdump ktrace.out | grep -i sig | grep -v __sig
   4829  1 ksh  CALL  kill(0x12dd, SIGTERM)
   4829  1 ksh  PSIG  SIGTERM caught handler=0x420810 mask=(): code=SI_USER 
sent by pid=4829, uid=1242)
   4829  1 ksh  CALL  kill(0, SIGTERM)
   4829  1 ksh  PSIG  SIGTERM SIG_DFL: code=SI_USER sent by pid=4829, uid=1242)
  20429  1 sh   PSIG  SIGTERM SIG_DFL: code=SI_USER sent by pid=4829, uid=1242)

(Note that `0x12dd' is decimal 4829).

[3] GNU make propagates signal to the running recipes
-----------------------------------------------------

If GNU make receives a terminating signal while it's updating some target(s), it
propagates that signal to the currently-executing recipe(s):

  $ cat Makefile 
  all: 1 2
  1 2:
       @trap 'echo got SIGTERM; exit 77' 15; while :; do :; done
  $ gmake -j2 &
  [1] 5980
  $ kill $!
  got SIGTERM
  got SIGTERM
  gmake: *** [2] Error 77
  gmake: *** [1] Error 77

(FWIW, I find this to be an helpful and rational behaviour).

[4] Putting it all together
---------------------------

So here is my diagnosis of what happens when `parallel-tests3.test' is
run on NetBSD with GNU make:

 1) various setup/preparation commands get executed in this script; the
    Korn shell gets selected to run the recipe of the Makefile;
 2) "make -j1 check" is launched in the background:
      cd serial
      $MAKE -j1 check &
 3) some more commands get run, and they concludes before the background
    make process launched in (2) has concluded;
 4) the shell executing `parallel-tests3.test' explicitly kills the still
    running background "make" process  with a SIGTERM:
      cd ..
      kill $!
 5) GNU make "relays" the SIGTERM to the korn shell executing the still
    running recipe(s);
 6) in turn, the korn shell relays the SIGTERM to all processes in its
    process group;
 7) this includes the top-level make process that is running the automake
    testsuite (if any); which explains the crash that is the object of
    this bug report.

I'm not 100% positive that point (7) is completely correct, but I'm running
out of time now, so I'll settle for this explanation; kudos to anyone who
can give some confirmation about the correctness of point (7)!

-*-*-*-

Now, the right fix for the bug is *not* to work around this behaviour
of the Korn shell; rather, we should fix the suspicious logic of the
`parallel-tests3.test' script, which was also causing a testsuite hanging
on FreeBSD.  Patch coming up shortly.

And it goes without saying that this horrendous NetBSD's Korn Shell
incompatibility should be documented in the autoconf manual; I will
maybe give it a shot in the next days if nobody beats me.

Regards,
  Stefano



reply via email to

[Prev in Thread] Current Thread [Next in Thread]