[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "wait" loses signals

From: Robert Elz
Subject: Re: "wait" loses signals
Date: Mon, 24 Feb 2020 15:59:43 +0700

    Date:        Fri, 21 Feb 2020 10:07:25 -0500
    From:        Chet Ramey <address@hidden>
    Message-ID:  <address@hidden>

  | That's just not reasonable. You're saying signals that are received before
  | the wait builtin begins executing (say, while the command is being parsed,
  | or the shell is doing some other bookkeeping task) should be considered
  | to have arrived while the wait builtin is executing. I'm pretty sure that's
  | not consistent with the letter or the spirit of the standard.

It quite clearly isn't consistent, what the standard says is:

     When the shell is waiting, by means of the wait utility, for
     asynchronous commands to complete, the reception of a signal for
     which a trap has been set shall cause the wait utility to return
     immediately with an exit status >128, immediately after which the
     trap associated with that signal shall be taken.

Note: "when the shell us waiting for an asynchronous command to complete"
(when that happens as a result of the user/script executing the wait utility)
then ...

What Denys is failing to realise, is that the standard describes what shells
do (or more accurately perhaps, did, in the late 1980's or early 1990's)
not what someone might want them to do.

And that is, when the wait/waitpid/wait3/wait4/waitid/wait6 (whatever the
shell  uses) system call returns EINTR, the wait utility exited with a
status indicating it was interrupted by that signal (status > 128 means 
128+SIGno) and runs the trap.

Because that is what shells actually did - the alternative being to
simply restart the wait on EINTR like many other system calls that are
interrupted by signals are conventionally restarted.

Like it or not, that's what shells did, what most still do, and what
the standard says must be done.

Apart from that, and not interrupting a wait for a foreground process,
the standard says very little about when traps should be run, and sorry
Harald, but your "as soon as" from ...

address@hidden said:
  | In the same way, I think that except when overridden by 2.11, the "when"
  | in "Otherwise, the argument action shall be read and executed by the
  | shell when one of the corresponding conditions arises." should be
  | interpreted as "as soon as". 

The only way to do that literally would be to run the trap from the signal
handler, as that is "as soon as" the condition arises.   But I think we all
know that is simply not possible.   So let's read that as "as soon as
possible after" instead.   That's getting more reasonable, but someone needs
to decide just what is possible - will running the trap handler mess up the
shell's internal state while a new command is parsed and executed?

Eg: what if we had
        VAR=$(grep  -c some_string file*.c)
and a (trapped) signal arrives while grep is running (more correctly, while
the process running the command substitution, which runs grep, is running).
We know we cannot interrupt the wait for that foreground process to run the
trap handler, so we don't - but do we execute the trap handler before we
assign the answer to VAR ?

This kind of thing is why shells in general only normally even look to
see if there is a trap handler waiting to run after completing executing
commands, not in the middle of one.

The relevance of this is that if a signal arrives while the wait command
is executing (or as Chet suggested, while doing whatever housekeeping is
needed to prepare to run it, like looking to see what command comes next)
but before the relevant wait*() system call is running, the trap won't
be run until after the wait command completes.

That's the way shells have always worked, and the way the standard (for that
very reason) says can be relied upon by scripts - which is much of its
purpose, to tell script writers what they can expect will work, and what
will not necessarily work.

Now the standard doesn't preclude a shell from looking for pending traps
as frequently as it wants to, every second line of C code in the shell could
        if (traps_pending) run_trap_handler();

But most shell authors (I believe) wouldn't consider that reasonable.

The standard also doesn't preclude a shell from taking extra measures to
push the arrival of a signal in the wait utility down to occur in the wait
system call (or whatever replaces it).   Old shells didn't do that, as there
simply was no mechanism for that, and using SIGCHLD was always problematic
because of its quite different implementation of different (now ancient)
systems, hence we have what we have.   The standard is not a legislature,
and does not change the rules just because what is there doesn't look
reasonable, or you don't like it.

If you want things changed, convince the major shell maintainers that this
race condition is something they should make their shell go slower to
fix (because that's really all it takes on modern systems) and wait for
them to comply.   When most major shells (perhaps all major shells, and
some of the others) have implemented what you want, then you can suggest
to the standards body that this is something that ought to be made available
as a reliable feature that scripts can rely on.   After that expect to wait
10-15 years for enough time to pass for a new version of the standard to be
due (it won't happen n a correction update) before anything happens.

I'm not a "major" shell maintainer by any means - but you would have trouble
convincing me of that - I simply don't believe that the trap/kill combination
is or ever was intended to be an IPC mechanism for the shell - rather traps
allow 2 main features ... they allow cleanup after various errors (deleting
temp files, etc) and they allow the script to report what it is doing when
requested (kind of like SIGINFO, using any available signal, but giving
script provided information).

And while for the latter, consider something like

        trap 'printf "At step %s\\n" "${step}" USR1

        : $((step = step + 1))
        # do something
        : $((step = step + 1))
        # do something
        : $((step = step + 1))
        make world

If a USER1 signal arrives while "make world" is running, we know that
we are not allowed to run the trap handler (that's because no shells
ever did - I know the FreeBSD shell has an option to alter that, but
portable scripts cannot assume any such thing).  Hence the "At step 3"
message would not appear until ater the maks has finished (perhaps hours
after the signal arrived).

That might be rewritten as

        make world &
        while :
                wait $!
                if [ $? -ne 158 ]               # 158 means SIGUSR1 for me
                                   # a portable script would need to
                                   # determine it dynamically

which most of the time will improve things, but because of the race reported
there's a (very) small chance that every now and then that it will end
up waiting even though a signal was sent just before it started its system
call, but in that situation your average user will just send another signal,
which this time will interrupt the sys call.   What we have now works well
enough for both these scenarios, the trap is eventually run, and if it seems
to be talking too long, another singal can simply be sent.   The trap
command will be run eventually.

That's of no use for IPC purpooses, but I don't care.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]