bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

potential bash bug, weird script behavior, Linux, SIGCHLD


From: Ingo Molnar
Subject: potential bash bug, weird script behavior, Linux, SIGCHLD
Date: Wed, 5 Dec 2007 21:56:24 +0100
User-agent: Mutt/1.5.17 (2007-11-01)

Oleg Nesterov has distilled a very simple (and reproducable) testcase 
below for what appears to be a potential long-existing bash bug. This is 
a problem that triggers on Linux quite frequently. (i can also send the 
configs.tar.bz2 testcase i made - but i think Oleg's is far simpler) I 
used bash-3.2-19.fc8 for my tests, on Linux 2.6.24-0.39.rc3.git1.fc9.

        Ingo

----- Forwarded message from Oleg Nesterov <oleg@tv-sign.ru> -----

Date: Mon, 3 Dec 2007 18:42:51 +0300
From: Oleg Nesterov <oleg@tv-sign.ru>
To: Ingo Molnar <mingo@elte.hu>
Subject: Re: weird script behavior, signals?
Cc: Jan Kratochvil <jkratoch@redhat.com>,
        Roland McGrath <roland@redhat.com>

On 12/03, Ingo Molnar wrote:
>
> here's a fresh incident that is 100% reproducible. I constructed the
> following simple oneliner script to analyze saved kernel config files:
>
>  for N in `grep 'is not set' config* | cut -d\# -f2- | cut -d' ' -f2 |
>  sort | uniq`; do printf "%10d %s\n" `grep "$N=y" config* | wc -l` $N; done
>
> the script starts printing results like this:
>
>         [...]
>         30 CONFIG_B43LEGACY_DEBUG
>         15 CONFIG_B43LEGACY_DMA_AND_PIO_MODE
>         18 CONFIG_B43LEGACY_DMA_MODE
>         19 CONFIG_B43LEGACY_PIO_MODE
>         21 CONFIG_B43_DEBUG
>         15 CONFIG_B43_DMA_AND_PIO_MODE
>         17 CONFIG_B43_DMA_MODE
>          6 CONFIG_B43_PCMCIA
>         [...]
>
> now if i Ctrl-C the script, i get:
>
>    -bash: printf: CONFIG_AFS_FS: invalid number
>
> if i Ctrl-Z the script, i get hung output, due to:
>
>         |-login(2068)---bash(2306)---bash(10838)-+-grep(10839)
>         |                                        `-wc(10840)
>
> both grep and wc are in T+ state:
>
>  mingo    10839  0.0  0.0   6088   676 tty2     T+   06:14
>  mingo    10840  0.0  0.0   3800   428 tty2     T+   06:14   0:00 wc -l
>
> is this signal behavior really expected? I cannot kill the script - i

I assume you still can kill it doing "kill" aon another console, yes?

> have to manually kill the wc and grep tasks and then have to wait until
> its finished. Is this normal?

Looks like a bash bug to me.

        $ echo `echo >&2 XXX; sleep 10000`
        $ ps ax
        ...
        2549 tty1     S      0:00 -bash
        2550 tty1     S+     0:00 sleep 10000
        ...

Small note, the job control rules is a black magic to me, so I assume it
is correct that "sleep" is in "foreground process group", but "bash" is not.
This -bash btw is the child of login shell, it executes `...`.

        $ cat /proc/2549/status
        ...
        ShdPnd: 0000000000000000
        SigBlk: 0000000000010000
        ...

No pending signals, but SIGCHLD is blocked, I think this is the reason.

        $ cat /proc/2549/wchan; echo
        do_wait

Now I press Ctrl-Z, SIGTSTP goes to "sleep" and stopes it.

        $ cat /proc/2550/status
        ...
        State:  T (stopped)
        ...

"sleep" notifies the parent,

        $ cat /proc/2549/status
        ...
        ShdPnd: 0000000000010000
        SigBlk: 0000000000010000
        ...

note the pending SIGCHLD. But it is blocked, signal_pending() is not true.
do_notify_parent_cldstop() does __wake_up_parent() anyway, but this doesn't
help because according to strace the "bash" does waitpid(-1, 0xafd37628, 0).

So do_wait() was called with options == WEXITED, it blocks again after wakeup.
This is correct because !signal_pending().

Unless I missed something, perhaps this should be reported to bash developers?

Oleg.

----- End forwarded message -----

----- End forwarded message -----

----- End forwarded message -----

----- End forwarded message -----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]