[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: "wait" loses signals
From: |
Chet Ramey |
Subject: |
Re: "wait" loses signals |
Date: |
Wed, 19 Feb 2020 15:30:17 -0500 |
User-agent: |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 |
On 2/19/20 5:29 AM, Denys Vlasenko wrote:
> A bug report from Harald van Dijk:
>
> test2.sh:
> trap 'kill $!; exit' TERM
> { kill $$; exec sleep 9; } &
> wait $!
>
> The above script ought exit quickly, and not leave a stray
> "sleep" child:
> (1) if "kill $$" signal is delivered before "wait",
> then TERM trap will kill the child, and exit.
This strikes me as a shaky assumption, dependent on when the shell receives
the SIGTERM and when it runs traps. (There's nothing in POSIX that says
when pending traps are processed. Bash runs them after commands.)
> (2) if "kill $$" signal is delivered to "wait",
> it must be interrupted by the signal,
> then TERM trap will kill the child, and exit.
This is well-defined by POSIX.
>
> The helper to loop the above:
>
> test1.sh:
> i=1
> while test "$i" -lt 100000; do
> echo "$i"
> "$@" test2.sh
> i=$((i + 1))
> done
>
> To run: sh test1.sh <shell_to_test>
>
> bash 4.4.23 fails pretty quickly:
>
> $ sh test1.sh bash
> 1
> ...
> 581
> _ <stops here for ~9 seconds>
It seems inherently racy. I ran this with a lightly-instrumented bash
and discovered that signals that arrived when `wait' was running were
always processed correctly and killed the process. There were a few
times when the signal arrived while `wait' was not running, and some
of these cases did not interrupt wait or cause trap execution.
Consider this scenario.
1. Bash forks and starts the background process
2. The parent fork returns
3. The parent bash checks for traps, and finds none
4. SIGTERM arrives, the trap signal handler sets a `pending trap' flag
for SIGTERM
5. The parent shell runs the `wait' builtin.
6. `wait' is not interrupted by a signal, runs to completion, and the
trap runs
The window for this is extremely small. I just ran the scripts on RHEL7
and had to go through the loop script multiple times before I saw the
9-second sleep. I saw it more often on Mac OS X, so the scheduler
probably plays a role.
Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU chet@case.edu http://tiswww.cwru.edu/~chet/