bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SIGINT handling


From: Stephane Chazelas
Subject: Re: SIGINT handling
Date: Mon, 21 Sep 2015 22:07:55 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

2015-09-21 15:04:46 -0400, Chet Ramey:
> On 9/20/15 3:45 PM, Stephane Chazelas wrote:
> > 2015-09-20 17:12:45 +0100, Stephane Chazelas:
> > [...]
> >> I thought the termsig_handler was being invoked upon SIGINT as
> >> the SIGINT handler, but it is being called explicitely by
> >> set_job_status_and_cleanup so the problem is elsewhere.
> >>
> >> child_caught_sigint is 0 while if I understand correctly it
> >> should be 1 for a cmd that calls exit() upon SIGINT. So that's
> >> probably probably where we should be looking.
> > [...]
> > 
> > I had another look.
> > 
> > If we're to beleive gdb, child_caught_sigint is 0 because
> > waitpid() returns without EINTR even though wait_sigint_received
> > is 1.
> > 
> > The only reasonable explanation I can think of is that the child
> > handles its SIGINT first, exits which updates its state and
> > causes bash the parent to be scheduled, and waitpid() returns
> > (without EINT) and after that bash's SIGINT handler kicks in too
> > late.
> 
> Absent kernel problems, there are four scenarios for the child process
> reacting to SIGINT:
> 
> 1.  The SIGINT arrives before the child begins executing.
> 
> 2.  The SIGINT arrives while the child is executing.
> 
> 3.  The SIGINT arrives while the child is exiting successfully.
> 
> 4.  The SIGINT arrives after the child has exited but before the
>     parent's waitpid() returns.
> 
> In the first two cases, the shell's waitpid() should return -1, but the
> first case will probably return ECHILD while the second returns EINTR.
> In the third case, there's not really anything the shell can do, since
> there's nothing to distinguish that case from one where the child catches
> SIGINT and exits successfully, and your patch doesn't change things.
> The fourth case will, in practice, be indistinguishable from the third
> case, since the kernel is usually `greedy' and will not return EINTR if
> there is something to report.

The problem is that here the parent's SIGINT handler is run upon
the return from waitpid(), just after. My patch doesn't rely on
EINTR from waitpid() (which doesn't happen here, waitpid() returns
with the pid of the child that did an exit() upon receiving
SIGINT), just on the "status" returned by the child, so doesn't
have the problem.

There would still be a problem if SIGINT was handled even
later (after we test for it), but I could not reproduce that.

Given that the SIGCHLD should come before SIGINT, it would seem
reasonable to assume SIGINT should be handled at the latest upon
the return of waitpid().

Can you please clarify why the check for EINTR was needed?

What do you suggest we do to fix that issue?

> In all these cases, I assume that bash has called waitchld() and
> waiting_for_child == 1.  If it's not, the signal handler treats the
> signal as it would normally, if it were not waiting for a child to
> exit.

Maybe the test scenario was not clear:

bash -c 'cmd; echo hi'

is run from an interactive shell, cmd is a long running
application (the problem that sparked this discussion was with
ping and I showed examples with an inline-script calling sleep)
that has a handler on SIGINT that calls exit().

Upon pressing ^C, a few seconds after starting that, so at a
time where bash is doing waitpid() and cmd is doing something
like sleep(), the tty line discipline sends SIGINT to the
foreground process group, so both bash and cmd.

Now, the whole problem is caused by cmd calling exit() straight
upon receiving that SIGINT. So everything happens at the same
time.

In the case where "hi" is not output, we have the events in this
order:
- SIGINT is sent to bash and cmd
- cmd handles its SIGINT and calls exit()
- bash's waitpid() returns without being interrupted with the
child's status being 0 (in anycase not WIFSIGNALED() with
SIGINT). 
- Straight upon return of that syscall (gdb shows a call trace
for the SIGINT handler in the __waipid() libc wrapper), bash's
SIGINT handler is executed.
- we return from the handler in __waitpid(), and then:
      if (pid < 0 && errno == EINTR && wait_sigint_received)
       child_caught_sigint = 1;
  because waipid() did *not* return with EINTR, we have
  child_caught_sigint = 0 (even though the child clearly caught
  sigint as it returned with WIFEXITED), so even though
  wait_sigint_received is 1, bash makes the wrong decision,
  decides the child did not catch SIGINT and kills itself with
  SIGINT (and so doesn't run echo hi).

Things that don't look right in the code either are things like
where those conditions are asserted and tested and the handler
set and reset, but then again I don't have the full picture.

> > Anyway, this patch makes the problem go away for me (and
> > addresses my problem #2 about exit code 130 not being treated
> > as an interrupted child). It might break things though if there
> > was a real reason for bash to check for waitpid()'s EINTR.
> 
> You should read
> 
> http://lists.gnu.org/archive/html/bug-bash/2011-02/msg00088.html
> 
> for a summary of why the test for waitpid() returning -1/EINTR exists.
> Linus's posts, at least the ones where there's more light than heat, are
> good reading.

The thing is that thread was about the opposite problem at the
other end of the spectrum so we need to find the right way to do
it so we don't cause one problem or the other.

Note that
ksh93 -c 'sh -c "trap exit INT; sleep 10; :"; echo hi"'
consistently outputs "hi" because ksh93 solely relies on the
status returned by waitpid() (regardless of whether it itself
received a SIGINT or not).

> > With that patch applied,
> > 
> > ./bash -c 'sh -c "trap exit INT; sleep 120; :"; echo hi'
> > ./bash -c 'mksh -c "sleep 120; :"; echo hi'
> > 
> > Does *not* output "hi" (as mksh or sh do a exit(130) which is
> > regarded as them being "interrupted by that SIGINT", or at least
> > reporting that the child they want to report the status of
> > (sleep) has been killed by a SIGINT).
> 
> This still counts as catching and handling the SIGINT, and the shell
> should not act as if the foreground process died as a result of one.

That's the point I'm arguing on.

If the command handled SIGINT and returned with 130, I argue it
is considering itself and telling its parent as having been
"interrupted"

The things that the WCE is guarding against (commands using ^C
for something else like your emacs one) are typically cases
where the command will *not* return with 130.

And consider the two cases I gave above as real-life examples
with mksh or a script with a signal handler that cleans-up:

trap '
  status=$?
  cleanup
  exit "$status"' INT TERM HUP

-- 
Stephane



reply via email to

[Prev in Thread] Current Thread [Next in Thread]