bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bash cannot kill itself?


From: Bob Proulx
Subject: Re: Bash cannot kill itself?
Date: Wed, 30 Jun 2010 18:50:38 -0600
User-agent: Mutt/1.5.18 (2008-05-17)

Chet Ramey wrote:
> Andreas Schwab wrote:
> "Clark J. Wang" writes:
> > > It does not work as I expected. The running script was not
> > > terminated after 5 seconds. So what's wrong here?
> > 
> > The shell is waiting for foreground process (sleep) to finish.  During
> > this time no other process will be started by the shell.
> 
> Yes.  The trap is not taken until after the foreground process has completed.
> If you run sleep in the background and use `wait', the signal will interrupt
> it.

There are some subtle points that I didn't think were yet completely
clear and so wanted to add a few words.

The 'sleep' command here is the standalone sleep command, probably the
coreutils sleep.  It is not a builtin to bash.  This differs from some
other shells such as ksh where sleep is a shell builtin.  In ksh the
sleep would be ksh itself and 'kill $$' would affect the sleep.  When
bash invokes 'sleep 3600' it is executed as an normal process and will
have a unique process id and will not be $$.  It will be a child of
$$.  Therefore 'kill -ALRM $$' won't interrupt the sleep but rather
would send the signal to the bash script interpreter invoking the
sleep.

But bash is waiting for the foreground process to complete.  It won't
notice signals it has trapped until the invoked process has returned.
At that point it takes notice of the trapped signals and handles them.
Interestingly if the signal was not trapped then the default system
handler would be invoked which in the case of SIGALRM (see signals(7))
the default action is process termination.  So if the signal had not
been trapped then the script would exit, leaving the sleep running as
an orphan.  But leaving the sleep running as an orphan isn't desirable
either.

If you ran the script without the trap and then looked in the process
table you would find the 'sleep 3600' process (or perhaps several
processes if you ran it several times) still running with a parent
process id of 1 since when the parent script exited the child process
was inherited by the init process.  Of course the init process id 1
waits for all children.  Eventually when the sleep expires the process
table would be cleaned up.

So...  Sending a kill to the parent script probably isn't what you
want.  Depending upon if you are trapping the signal it will either be
waiting for the foreground process to finish or will abandon the
child as an orphan.  Instead you probably want to send the signal to
the sleep process too.  Then the sleep would be interupted and would
return to the shell.  But getting the process id of the sleep at that
point is hard.

I hate to mention this next part but you could send it to the process
group.  On job control systems every command is run in its own process
group.  Negative PID values may be used to choose whole process
groups.  So if that kill was 'kill -ALRM -$$' then it would send the
kill to all processes in the process group.  That would send to both
the script and to the sleep process and also to any other process that
was in the same process group.  Note that systems with job control
will have different behavior then systems without job control.  If run
on a system without job control IIRC I think that all processes in the
login session will be in the same process group and so all would get
the signal.  But I no longer have access to such a test environment
and so couldn't check this.  But in that environment I think the
result would be bad.  Probably you would be logged out.  I think.  Not
sure what it would do if run by root as part of a system startup.
Nothing good I am sure.  I can't think of any way to safely use this
tidbit of information and I probably shouldn't have even brought it
up.

So...  Restructuring your process flow to be aware of these issues as
suggested by others would probably be your best choice.

> If you run sleep in the background and use `wait', the signal will interrupt
> it.

Using that flow the child process will still be running but the parent
script can send a signal to the child in the script's signal handler.
That will prevent leaving the child running as an orphan.

But don't forget that the reverse problem also exists.  The wait_kill
is forked into the background and will eventually send a signal to
whatever process is $$ when it fires.  If the script terminates
earlier then similar background child process handling should be in
place for the wait_kill in the background as well.  Or in the worst
case it might be possible for the timeout watchdog to kill a
completely unrelated process!  And there might be an error from the
kill if at the last moment the target exited on its own.  So more
handling is needed.  I think perhaps something like this:

  #!/bin/bash
  trap 'test -n "$childpid" && kill $childpid; echo killed by SIGALRM; exit 1' 
ALRM
  function wait_kill() {
      sleep 5
      kill -ALRM $$ 2>/dev/null
  }
  wait_kill &
  waitkillpid=$!
  sleep 3600 &
  childpid=$!
  wait $childpid
  test -n "$waitkillpid" && kill $waitkillpid 2>/dev/null

Well...  That is probably enough on this topic for now. :-)

Bob

P.S. In the old days mixing sleep(3) and SIGALRM was frowned upon.
When I first read the problem I thought it was going to be a SIGALRM
interaction.  The man page for sleep(3) still carries this warning.

       sleep() may be implemented using SIGALRM; mixing calls to
       alarm(2) and sleep() is a bad idea.

The glibc info page describes the problem in a little more detail

  info libc Sleeping

or the online page here

  http://www.gnu.org/software/libc/manual/html_node/Sleeping.html#Sleeping

  On some systems, sleep can do strange things if your program uses
  SIGALRM explicitly.  Even if SIGALRM signals are being ignored or
  blocked when sleep is called, sleep might return prematurely on
  delivery of a SIGALRM signal.  If you have established a handler for
  SIGALRM signals and a SIGALRM signal is delivered while the process
  is sleeping, the action taken might be just to cause sleep to return
  instead of invoking your handler.  And, if sleep is interrupted by
  delivery of a signal whose handler requests an alarm or alters the
  handling of SIGALRM, this handler and sleep will interfere.

The POSIX docs say more interesting things about interaction with
SIGALRM.

  http://www.opengroup.org/onlinepubs/009695399/utilities/sleep.html

But the coreutils 'sleep' is implemented using gnulib's xnanosleep
wrapper around nanosleep.  The nanosleep(2) system call documentation
notes this in regard to signal usage:

       Compared to sleep(3) and usleep(3), nanosleep() has the
       following advantages: it provides a higher resolution for
       specifying the sleep interval; POSIX.1 explicitly specifies
       that it does not interact with signals; and it makes the task
       of resuming a sleep that has been interrupted by a signal
       handler easier.

Which I interpret as saying that when using nanosleep there isn't an
interaction with SIGALRM.  If the command is implemented using
sleep(3) then SIGALRM should be avoided.  But SIGALRM is okay to use
when it is implemented using nanosleep(2).  But portable scripts
shouldn't rely upon this implementation detail.  It would be best to
avoid SIGALRM when using sleep to avoid the issue.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]