bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Signal handler may hang in futex_wait on SMP


From: Chet Ramey
Subject: Re: Signal handler may hang in futex_wait on SMP
Date: Fri, 26 Feb 2010 13:57:50 -0500
User-agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.7) Gecko/20100111 Lightning/1.0b1 Thunderbird/3.0.1

On 2/25/10 7:38 AM, werner@suse.de wrote:
> Configuration Information [Automatically generated, do not change]:
> Machine: i586
> OS: linux-gnu
> Compiler: gcc -I/usr/src/packages/BUILD/bash-4.1 
> -L/usr/src/packages/BUILD/bash-4.1/../readline-6.1
> Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='i586' 
> -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='i586-suse-linux-gnu' 
> -DCONF_VENDOR='suse' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL 
> -DHAVE_CONFIG_H   -I.  -I. -I./include -I./lib   -O2 -march=i586 -mtune=i686 
> -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector 
> -funwind-tables -fasynchronous-unwind-tables -g -D_LARGEFILE64_SOURCE 
> -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DRECYCLES_PIDS -Wall -g -std=gnu89 
> -Wextra -Wno-unprototyped-calls -Wno-switch-enum -Wno-unused-variable 
> -Wno-unused-parameter -ftree-loop-linear -pipe -fprofile-use
> uname output: Linux boole 2.6.27.19-3.2-pae #1 SMP 2009-02-25 15:40:44 +0100 
> i686 i686 i386 GNU/Linux
> Machine Type: i586-suse-linux-gnu
> 
> Bash Version: 4.1, 4.0, 3.2, 3.1, 3.0
> Patch Level: all
> Release Status: release
> 
> Description:
>       Signal handler may hang in futex_wait() on fast multi processor systems.

This doesn't mean much.

>         This seems to caused by using stdio within signal handlers in some 
> cases
>         where glibc uses malloc()/free() internal.

OK, let's set some baselines here.  Most of the time, a received signal
causes bash to set an internal flag and defer the actual handling of the
signal until later.  This includes signals for which a user has set a trap.

The problem appears to be that bash sets an internal flag indicating that
a signal should be processed immediately instead of waiting for a "good
time" under certain circumstances and, when it receives a signal for which
it has set a trap, running the trap handler immediately causes glibc to
execute functions that are not "signal safe" and it is not prepared to
accommodate.

This report doesn't include the most basic information: the signal bash
receives that causes this (not all signals are treated identically), and
the contents of the trap handler.  It doesn't even say whether bash is
interactive or not, or under what circumstances it's executing.  Let's
start there.

(Since I cheated and looked back at previous reports from Novell, I'm
going to assume the shell is not interactive while this is happening.)

Bash sets this flag under two basic circumstances: when it will potentially
block in a state that will not be interruptible (e.g., reading from a
remote file system), or when reading from the keyboard, when users expect
read(2) to be interrupted and any trap to be taken immediately.

The first case doesn't seem to apply.  The second case is primarily used
when the shell is interactive, so those uses don't work either.  The
remaining potential places where the "interrupt_immediately" variable is
set are during the execution of the wait and read builtins and the
unwind-protect framework.

It should be possible for those folks who can reproduce this issue to
instrument bash in such a way as to track the value of
"interrupt_immediately" and notify when it changes.  Whether that means
outputting some message when it's incremented and decremented or using
something like gdb's watchpoints,  if we're going to assume that the
variable is being inappropriately set (or not reset) the way to a robust
fix is to find out where and why that's happening.

One strategy I've used in the past is to assign a numeric tag to each place
where the variable is modified, and write a message that includes the tag
and the variable's value when when the variable is modified.  It's never
been a problem for me to use stdio to do this, but it may be different on
Linux (I don't do the majority of my development on Linux).  The increments
and decrements should match, and there should always be a corresponding
assignment of 0 after an assignment of 1.

> Fix:
>         For the malloc()/free() used by the bash (confgured with 
> --without-gnu-malloc
>         and --without-bash-malloc) I use the patch below but this does not 
> work for
>         the in glibc internal used malloc()/free() calls.  A real solution 
> could be
>         the way done in tcsh or ksh where only flags will be set from the 
> signal
>         handlers whereas the real work is done within the main loop its self.
> 
> --- parse.y
> +++ parse.y   2010-01-20 13:51:39.000000000 +0000
> @@ -1434,10 +1434,11 @@ yy_readline_get ()
>                                                 current_readline_prompt : "");
>  
>        terminate_immediately = 0;
> -      if (signal_is_ignored (SIGINT) == 0 && old_sigint)
> +      if (signal_is_ignored (SIGINT) == 0)
>       {
>         interrupt_immediately--;
> -       set_signal_handler (SIGINT, old_sigint);
> +       if (old_sigint)
> +         set_signal_handler (SIGINT, old_sigint);
>       }

This patch is ok, in that it makes the code more symmetric, but it's
probably not relevant to this issue.  This code is used when the shell
is interactive, at which point the SIGINT handler has already been set
to a known value and will not be NULL.

>  #if 0
> --- xmalloc.c
> +++ xmalloc.c 2010-02-24 08:32:51.452626384 +0000
> @@ -35,6 +35,11 @@
>  #  include "ansi_stdlib.h"
>  #endif /* HAVE_STDLIB_H */

This isn't a fix.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU    chet@case.edu    http://cnswww.cns.cwru.edu/~chet/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]