monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: monit deadlock


From: Martin Pala
Subject: Re: monit deadlock
Date: Thu, 18 Sep 2003 00:03:20 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030908 Debian/1.4-4

Martin Pala wrote:

Jan-Henrik Haukeland wrote:

Martin Pala <address@hidden> writes:

(gdb) set stopped=1
(gdb) print stopped
$3 = 1
(gdb) continue
Continuing.
Program exited normally.
(gdb)


Hmm, I'm not sure. We use lots of signal blocks (which is good), but
here the TERM signal must have been blocked so do_destroy could not be
called. Try to move the 'do_signal_block();' from main() into
do_destroy *after* Run.stopped= TRUE;

Does it help?

I tried it - it didn't helped. From the gdb output it seems that signals works:

#0  0x4002a354 in __pthread_sigsuspend () from /lib/libpthread.so.0
#1 0x4002a118 in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0
#2  0x400274dc in pthread_join () from /lib/libpthread.so.0
#3  0x0804e7d7 in monit_http (action=-4) at monit_http.c:131
#4  0x0804f17e in do_destroy (sig=15) at monitor.c:512
#5  0x4002d685 in __pthread_sighandler () from /lib/libpthread.so.0
#6  <signal handler called>
#7  0x4002d91b in read () from /lib/libpthread.so.0
#8  0x00000006 in ?? ()
#9  0x00000038 in ?? ()
#10 0x0805c651 in read_proc_file (
buf=0xbfffe880 "10233 (mozilla-bin) S 10232 690 690 0 -1 64 23 0 5 0 141 117 0 0 9 0 0 0 1843919 89362432 14890 4294967295 134512640 134720822 3221224560 3212834796 1080776422 0 0 4098 17453 3222590806 0 0 33 0\n0 0 0"..., buf_size=4096, name=0xc3 <Address 0xc3 out of bounds>,
  pid=10233) at process/common.c:98
...

On line #10 was monit doing its normal work, on line #6 it was interrupted by signal which was handled by do_destroy. Monit should call monit_http(STOP_HTTP) which will set global httpd stopped flag to true. For some reason the stopped flag is false => monit httpd don't know that it should stop. I see no reason why stopped flag is still false. The only thing which looks strange to me is on line #3:

monit_http (action=-4)

I usually don't use debuggers => i'm not sure whether this is correct (STOP_HTTP value is defined as 2 in monitor.h)

Martin



_______________________________________________
monit-dev mailing list
address@hidden
http://mail.nongnu.org/mailman/listinfo/monit-dev

It is solved. It seems that the problem was caused by race condition between monit_http(START_HTTP) and monit_http(STOP_HTTP). It seems that in the case that the signal was delivered in the meantime between the httpd thread_wrapper() reached from set_signal_block() to start_httpd(), stopped flag was set by do_destroy() signal handler to true and do_destroy() started to wait for httpd thread to terminate. As soon as monit returned from signal handler, it resumed by start_httpd(), which reseted stopped flag to false.

Martin









reply via email to

[Prev in Thread] Current Thread [Next in Thread]