[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Hanlding SIGSEGV/SIGBUG with glibc 2.3.2
From: |
Yair Lenga |
Subject: |
Hanlding SIGSEGV/SIGBUG with glibc 2.3.2 |
Date: |
Thu, 24 Jun 2004 12:55:50 -0400 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040113 |
Hi,
I am working on porting large program running on SGI/SunOS to RedHat
Linux AS3.0, using glibc 2.3.2. My question is
How to implement a cleanup function in response to SIGSEGV, SIGBUS and
other signals for which the signal handler can not flag the condition
and return.
Details:
The server programs are using signal handler to perform various cleanup
tasks, with the sequence like:
signal_handler (signo) {
...
if ( serving_request) notify_remote_monitor(RPC_HAS_FAILED) ;
system("record_crash") ;
...
}
The notify_remote_monitor is an RPC call to a different server - to
notify him that a failure has occured. This is a MUST requirement.
The system was working OK for any signal with AS2.1 - glibc-2.2.4 - both
user generated signals (SIGTERM ...), and for software failures
(SIGSEGV, SIGBUS). After switching to Advanced Server 3.0 - glibc-2.3.2
using tls/libc.so, we found that in many cases the server will hang
after getting a signal. Attaching GDB to the server found that the
signal (in this case SIGCHLD==17), happend during "free". Attempt to go
into "vfork" cause infinite spin on mutex, that is left lock from the
uncompleted free. The problem can be replicated for many signals, and in
general the sequence is:
* The program is calling free
* Free is locking the arena, call int_free to free the memory
* Signal is recieved
* the signal handler is invoked, trying to call
free/malloc/vfork/... as part of the cleanup
* The process is trying to lock the arena - and get into infinite wait.
The documentation is very clear that the signal handler should not do
anything, but to flag the error condition, and return - and add a check
for the flag during the normal program flow. I can implement this (with
some effort) for SIGTERM, SIGALRM, and other signals that can resume
processing. But this approach does not work for SIGSEGV, SIGBUS, etc -
where the signal handler can not return.
I tried using setjmp and longjmp to resume processing after SIGSEGV, but
it could not resolve the mutex lock.
I hope that other people have some experience and/or ideas on how to
deal with SIGSEGV (and similar) signals.
Many thanks for any help,
Yair Lenga
gdb) where
#0 0xb742c8dc in ptmalloc_lock_all () from /lib/tls/libc.so.6
#1 0xb7461796 in fork () from /lib/tls/libc.so.6
#2 0x0804ee17 in fork_process (c=0xbfff91d8
"/home/sb/book/sbyb/bin/mortserver", argv=0xbfff9128) at fork_process.c:15
#3 0x0804b0b6 in sched_fork_server (serverfile=0xbfff91d8
"/home/sb/book/sbyb/bin/mortserver", dblogin=0x80809bc "",
mortdb=0x80809a8 "", port=4200, cpid=0) at yb_sched.c:376
#4 0x0804c18a in restart_server (tbl=0x8080f98, login=0x80809bc "",
mortdb=0x80809a8 "", serverfile=0xbfff91d8
"/home/sb/book/sbyb/bin/mortserver", port=4200, cpid=0) at yb_sched.c:658
#5 0x0804c0fa in sig_chld (x=17) at yb_sched.c:642
#6 <signal handler called>
#7 0xb7429eca in _int_free () from /lib/tls/libc.so.6
#8 0xb7428e68 in free () from /lib/tls/libc.so.6
#9 0xb74bb8d8 in xdrrec_destroy () from /lib/tls/libc.so.6
#10 0xb74b8e53 in svctcp_destroy () from /lib/tls/libc.so.6
#11 0xb74b7d49 in svc_getreq_common_internal () from /lib/tls/libc.so.6
#12 0xb74b7b0f in svc_getreqset_internal () from /lib/tls/libc.so.6
#13 0x08053cd6 in yb_svc_run (lsock=3, str=0xbfffd4f4
"/home/sb/book/sbyb/bin/yb_scheduler") at yb_rpc_svc_lib.c:593
#14 0x080525d2 in enter_mainloop (lsock=3, cp=0xbfffd4f4
"/home/sb/book/sbyb/bin/yb_scheduler") at yb_rpc_svc_lib.c:250
#15 0x08058b45 in yb_server_main (argc=2, argv=0xbfff9df4) at
yb_rpc_lmain.c:41
#16 0x0805726e in main (argc=2, argv=0xbfff9df4) at yb_rpc_main.cc:14
If someone is intersted, attached is a small program to replicae the
problem. It hangs for me on RedHat AS3.0, see stack trace below.
#include <signal.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
static void catch_me(int signo) ;
static void boom(int signo) ;
main()
{
signal(SIGALRM, boom) ;
signal(SIGSEGV, catch_me) ;
free("ab") ;
}
static void boom(int signo) {
printf("bam\n") ;
_exit(0) ;
}
static void catch_me(int signo) {
signal(signo, SIG_DFL) ;
printf("bim\n") ;
system("echo system catch me") ;
free(q) ;
printf("bom\n") ;
raise(signo);
}
The stack trace:
gdb -p 14911
(gdb) where
#0 0xb758d241 in __lll_mutex_lock_wait () from /lib/tls/libc.so.6
#1 0xb7518e64 in _L_mutex_lock_2507 () from /lib/tls/libc.so.6
#2 0xb74e1e84 in system () from /lib/tls/libc.so.6
#3 0x0804848b in catch_me ()
#4 <signal handler called>
#5 0xb7515e6f in _int_free () from /lib/tls/libc.so.6
#6 0xb7514e68 in free () from /lib/tls/libc.so.6
#7 0x08048443 in main ()
- Hanlding SIGSEGV/SIGBUG with glibc 2.3.2,
Yair Lenga <=