[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bug#8788: Weird testsuite failure on NetBSD (parallel tests, backgro
From: |
Stefano Lattarini |
Subject: |
Re: bug#8788: Weird testsuite failure on NetBSD (parallel tests, background processes) |
Date: |
Tue, 18 Oct 2011 23:16:25 +0200 |
User-agent: |
KMail/1.13.5 (FreeBSD/8.2-RELEASE; KDE/4.5.5; i386; ; ) |
Reference:
<http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8788>
[Adding bug-autoconf in CC]
On Thursday 02 June 2011, Stefano Lattarini wrote:
> Hello automakers.
>
> While teststing the `testsuite-work' branch on NetBSD 5, I've encountered
> a weird failure in the test `parallel-tests3.test', which actually caused
> the whole testsuite to crash (!) due to a stray SIGTERM.
>
> [SNIP]
>
> Any idea of what's going on?
>
Ah ah, got it! (I think). The failure is due to an interaction between some
features of GNU make and some (mis)features the NetBSD Korn Shell. Let's see
the details.
[1] The Korn shell gets selected to run the Makefile recipes
-------------------------------------------------------------
On NetBSD, an autoconf-generated configure script will select /bin/ksh as
the $(SHELL) used to execute the Makefile recipes:
$ grep 'SHELL.*=' tests/parallel-tests3.dir/*/config.log
tests/parallel-tests3.dir/parallel/config.log:SHELL='/bin/ksh'
tests/parallel-tests3.dir/serial/config.log:SHELL='/bin/ksh'
[2] The Korn shell has quirks w.r.t. signal handling
----------------------------------------------------
The NetBSD's Korn Shell is one of those shells which try to "propagate"
terminating signals, as explained in the ``Signal Handling'' node of the
(as of today yet unreleased) bleeding-edge autoconf manual; see also these
relevant links:
<http://lists.gnu.org/archive/html/autoconf-patches/2011-09/msg00005.html>
<https://lists.gnu.org/archive/html/bug-autoconf/2011-09/msg00004.html>
<http://mail.opensolaris.org/pipermail/ksh93-integration-discuss/2009-February/004121.html>
And in fact, NetBSD's Korn Shell even seems to propagate a fatal signal
it has received *to all its process group*! Let's see a few examples:
$ /bin/sh -c '/bin/sh -c "kill -15 \$\$"; echo alive'
[1] Terminated /bin/sh -c "kill...
alive
$ /bin/ksh -c '/bin/sh -c "kill -15 \$\$"; echo alive'
Terminated
alive
# ksh apparently terminate its parent
$ /bin/sh -c '/bin/ksh -c "kill -15 \$\$"; echo alive'
Terminated
$ /bin/ksh -c '/bin/ksh -c "kill -15 \$\$"; echo alive'
Terminated
Terminated
Just to be sure, let's try to trace the systems calls made by the Korn
shell:
$ ktrace /bin/sh -c '
> echo parent: $$
> ktrace -a /bin/ksh -c "echo child: \$\$; kill -15 \$\$"
> echo alive
'
parent: 20429
child: 4829
Terminated
$ kdump ktrace.out | grep -i sig | grep -v __sig
4829 1 ksh CALL kill(0x12dd, SIGTERM)
4829 1 ksh PSIG SIGTERM caught handler=0x420810 mask=(): code=SI_USER
sent by pid=4829, uid=1242)
4829 1 ksh CALL kill(0, SIGTERM)
4829 1 ksh PSIG SIGTERM SIG_DFL: code=SI_USER sent by pid=4829, uid=1242)
20429 1 sh PSIG SIGTERM SIG_DFL: code=SI_USER sent by pid=4829, uid=1242)
(Note that `0x12dd' is decimal 4829).
[3] GNU make propagates signal to the running recipes
-----------------------------------------------------
If GNU make receives a terminating signal while it's updating some target(s), it
propagates that signal to the currently-executing recipe(s):
$ cat Makefile
all: 1 2
1 2:
@trap 'echo got SIGTERM; exit 77' 15; while :; do :; done
$ gmake -j2 &
[1] 5980
$ kill $!
got SIGTERM
got SIGTERM
gmake: *** [2] Error 77
gmake: *** [1] Error 77
(FWIW, I find this to be an helpful and rational behaviour).
[4] Putting it all together
---------------------------
So here is my diagnosis of what happens when `parallel-tests3.test' is
run on NetBSD with GNU make:
1) various setup/preparation commands get executed in this script; the
Korn shell gets selected to run the recipe of the Makefile;
2) "make -j1 check" is launched in the background:
cd serial
$MAKE -j1 check &
3) some more commands get run, and they concludes before the background
make process launched in (2) has concluded;
4) the shell executing `parallel-tests3.test' explicitly kills the still
running background "make" process with a SIGTERM:
cd ..
kill $!
5) GNU make "relays" the SIGTERM to the korn shell executing the still
running recipe(s);
6) in turn, the korn shell relays the SIGTERM to all processes in its
process group;
7) this includes the top-level make process that is running the automake
testsuite (if any); which explains the crash that is the object of
this bug report.
I'm not 100% positive that point (7) is completely correct, but I'm running
out of time now, so I'll settle for this explanation; kudos to anyone who
can give some confirmation about the correctness of point (7)!
-*-*-*-
Now, the right fix for the bug is *not* to work around this behaviour
of the Korn shell; rather, we should fix the suspicious logic of the
`parallel-tests3.test' script, which was also causing a testsuite hanging
on FreeBSD. Patch coming up shortly.
And it goes without saying that this horrendous NetBSD's Korn Shell
incompatibility should be documented in the autoconf manual; I will
maybe give it a shot in the next days if nobody beats me.
Regards,
Stefano
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: bug#8788: Weird testsuite failure on NetBSD (parallel tests, background processes),
Stefano Lattarini <=