[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bug where SIGINT trap handler isn't called
From: |
Patrick Plagwitz |
Subject: |
Re: Bug where SIGINT trap handler isn't called |
Date: |
Sun, 26 Jul 2015 03:08:42 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.0.1 |
On 22/07/15 04:01, Chet Ramey wrote:
> On 7/16/15 12:05 AM, Patrick Plagwitz wrote:
>
>>> This is another case of the scenario most recently described in
>>>
>>> http://lists.gnu.org/archive/html/bug-bash/2014-03/msg00108.html
>>>
>>> In this case, python appears to catch the SIGINT (it looks like a
>>> KeyboardInterrupt exception), print the message, and exit with status 1.
>>>
>>> Chet
>>>
>>
>> Ok, I see.
>> However, there appears to be some race condition when waiting for a
>> command substitution.
>> I have the attached combination of scripts.
>>
>> When run with
>> $ bash run-until-end-of-loop.sh bash start-subst-loop.sh '2>/dev/null'
>> , the script will try to launch and SIGINT subst-loop.sh repeatedly
>> until the SIGINT trap is once *not* called which will happen in at most
>> 200 repetitions on my machine. The script then ends after printing “end
>> of loop”. The python script execute-with-sigint.py is only there to
>> enable subst-loop.sh to receive a SIGINT at all.
>> subst-loop.sh calls date(1) in a loop; date should have SIGINT set to
>> SIG_DFL other than python initially.
>>
>> I analyzed the execution by inserting some debug output into the bash
>> code. It seems that in the case that the SIGINT trap is not called
>> subst-loop.sh gets the SIGINT while (or shortly before or after) calling
>> waitpid in waitchld:jobs.c which will then return without errno ==
>> EINTR. The wait_sigint_handler will be called, though, and so
>> wait_sigint_received will be true.
>
> This doesn't agree with what I see on RHEL6. I get waitpid() returning -1/
> EINTR, which bash interprets, using a heuristic, to mean that the child
> blocked or caught SIGINT, in which case bash should not act as if it
> received it.
>
> There is a small race condition here, which is very hard to close while
> maintaining the desired behavior: bash only responds to SIGINT received
> while waiting for a child if the child exits due to SIGINT. You appear
> to have hit it: the timing of the ^C is such that waitpid returns
> -1/EINTR, causing the shell to ignore the ^C. I suspect this is because
> the child called exit(0) before the ^C arrived and the shell got the
> signal, but the child exited successfully, causing the shell to assume
> the child blocked or caught the SIGINT.
>
> That heuristic was developed as the result of an extensive discussion
> between me and several Linux kernel developers back in 2011. You can
> read that here:
>
> http://lists.gnu.org/archive/html/bug-bash/2011-02/msg00050.html
> http://lists.gnu.org/archive/html/bug-bash/2011-03/msg00000.html
>
> I will look at the signal handler race condition you identified.
>
> Chet
>
Thanks. And thanks for the information.
So
$ bash -c 'python -c "import time; time.sleep(10)"; echo foo'
outputs foo when ^Cd because bash suppresses normal SIGINT handling if
the child is assumed to have handled it itself. Meanwhile
$ bash -c 'sleep 10; echo foo'
doesn't output foo.
But custom trap handlers are called either way in
set_job_status_and_cleanup.
$ bash -c 'trap "echo trap; exit" INT; python -c "import time;
time.sleep(10)"; echo foo'
prints trap, not foo, as does
$ bash -c 'trap "echo trap; exit" INT; sleep 10; echo foo'
I think command substitution has two issues. Those are because
set_job_and_cleanup isn't called for the single child made for comsub.
(1) The bug that was fixed in the discussion you linked is still in for
comsub waits:
$ bash -c 'while [ "$(exec >&-; sleep 0.001)" = "" ]; do :; done'
sometimes requires two ^Cs to stop.
The reason seems to be the same race condition described in
http://lists.gnu.org/archive/html/bug-bash/2011-02/msg00073.html
and in your and in my last mail. Checking last_command_exit_value ==
(128 + SIGINT) in command_substitute reflects the way the now-called
child_caught_sigint was determined before the patch made during the
discussion (i.e. it doesn't implement the heuristics).
As a side note,
http://lists.gnu.org/archive/html/bug-bash/2011-03/msg00039.html
explained to me why this outcome of the race condition is at all likely.
(2) Running the comsub version of one of the above scripts:
$ bash -c 'trap "echo trap; exit" INT; foo="$(exec >&-; python -c
"import time; time.sleep(10)")"; echo foo'
yields another result (it prints foo, not trap). This is the behavior
the original bug report was about and you already replied to it but is
the difference between comsub children and normal ones here intended?