[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lock-up on SIGSTOP/CONT (was: 'suspend' and 'xterm')

From: Sven Mascheck
Subject: lock-up on SIGSTOP/CONT (was: 'suspend' and 'xterm')
Date: 23 May 2001 07:28:16 +0200
User-agent: tin/1.4.5-20010409 ("One More Nightmare") (UNIX) (SunOS/5.8 (sun4u))

[xpost+fup2 comp.unix.misc]

Adam Sulmicki <address@hidden> wrote:
> <address@hidden> wrote:

[ shell in xterm, calling 'suspend' built-in -> xterm locking up! ]

>> Hm, you can ask xterm(1) to send itself a SIGCONT, [via xterm menu]
> [...] I have just tried it and once I type suspend with the bash,
> xterm stops to respond to any commands.

It depends on the OS.  I was rather curious and made some
unrepresentative tests under xterm(1), sshd(8), telnetd(1/8)
rlogind(1/8), see end of posting.

In short: 

   This seems to happen if a blocking wait() is called in the parent
   process, being also programmed with BSD-like signal handling.

   If a child changes status due to SIGSTOP, this usually means also
   a SIGCHLD to the parent, to notify it about this change.

   Some programs just wait(2) in the according handler for their
   child and lock up now, as the child actually has not exited.

(so the shell actually doesn't matter)

I was curious about xterm-150 (trying on Linux, fastest for
compiling among the local boxes where lock-ups happen)
and found that it blocks in "pid = wait(NULL);"
[main.c, function "reapchild(int n)"]

Avoiding this blocking gives back the ability to send
a SIGCONT to its child via the menu.

I tried two ways, which seemed to work in principle

 - using sigaction(2) for adding SA_NOCLDSTOP to its flags
   "don't send job control SIGCHLD's".
   (main.c, main(), after "signal (SIGCHLD, reapchild);")

        struct sigaction sig_act;
        sig_act.sa_handler = reapchild;
        sig_act.sa_flags = 0;
        sig_act.sa_flags |= SA_NOCLDSTOP;
        /*    sig_act.sa_flags |= SA_RESETHAND; */
        /*    sig_act.sa_flags |= SA_NODEFER;   */
        if ( sigaction(SIGCHLD, &sig_act, NULL) < 0 ) {
            fprintf(stderr, "sigaction failed...\n");

   Interesting: On Solaris, signal(5) mentions, that SA_NOCLDSTOP is
   even set per default when using signal(3) (instead of sigaction(2)).
   (BTW, one can look with psig(1) for this at running processes.)
   And That's why xterm is fine here.

 - modifying the wait in the handler main.c:reapchild(),
   from a plain "pid = wait(NULL)" to

        pid = waitpid(-1, &status, WUNTRACED);
        if (WIFSTOPPED(status))  SIGNAL_RETURN;

   (but this might be a faulty hack)

Please take the above with a grain of salt, i am not really familiar
with wait()/sigaction()/signal(), but certainly curious about drawbacks
of the above.

Interesting: screen(1) looks at the status of wait(2), and is
immediately sending a CONT, if one of its children got STOPed.
("Child has been stopped, restarting.")

                            xterm   sshd   telnetd rlogind
SunOS5.x                      +      +        +      +
HP-UX10.x                     +      +        +      +
IRIX6.5.x                     +      +        -      -
Linux-2.2.16/glibc-2.1.3      -      -        +      +
Linux-2.0.21/libc5            -      -        ?      ?
FreeBSD4.2                    ?      -        ?      ?
OpenBSD2.7                    ?      -        ?      ?
AIX4.3                        -      -        -      +
SunOS4.1.4                    ?      -        ?      ?
OSF1/V4                       -      -        -      -
   ["?": couldn't try (not my boxes),  "+" ok,  "-" locking up ]
(yes, i was rather curious)
Meanwhile i have a pretty bunch of zombies on the AIX, as the kernel
does not take care of these ghosts resulting from parents not wait()ing
properly :)  On Free/OpenBSD it usually took some minutes to clean them
up, but some others are still lurking around.  I don't know about the
kernel mechanism.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]