[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: monit restart bug/race condition (3.1 and current CVS version behavi
From: |
Martin Pala |
Subject: |
Re: monit restart bug/race condition (3.1 and current CVS version behavior) |
Date: |
Sun, 09 Feb 2003 22:00:53 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021226 Debian/1.2.1-9 |
I looked for the cause of it - there are two problems:
1.) described race condition
2.) the real reason why monit in daemon mode stops monitoring is, that
do_restart() method sets the state of toggle_validate_flag as usual, but
it is not used in subsequent http actions. Because it is based on stop
command call, monit httpd calls:
check_process(name, action, TRUE);
^^^^^^
which toggles do_validate flag. It isn't regarded to race condition,
but makes the problem worse. I think that problem 2.) could be easily
solved by implementing 'restart' method in monit http. I attached patch
for it - it isn't yet complete - it is needed to modify control.c
methods to call 'restart' action. I will yet look on it (but i must now
leave the computer, otherwise my wife will kill me ;)
Martin
Martin
Martin Pala wrote:
Hi,
there's bug in monit since 3.1 - if monit is running in daemon mode
and you want to restart the service by executing (for example):
unicorn:~#monit restart slapd
you get following output in monit's log:
Feb 9 19:03:25 unicorn monit[3157]: stop: (slapd) /etc/init.d/slapd
Feb 9 19:03:26 unicorn monit[3155]: 'slapd' have valid checksums
Feb 9 19:03:26 unicorn monit[3155]: start: (slapd) /etc/init.d/slapd
Feb 9 19:03:26 unicorn monit[3157]: monit: Warning process 'slapd'
was not stopped
Feb 9 19:03:26 unicorn monit[3157]: Monitoring disabled -- process slapd
(Legend: 3155=monit main thread, 3157=monit httpd thread)
Since that, slapd isn't monitored anymore but it is running.
Problem is, that monit calls stop method via monit httpd and waits for
process termination. As soon as process is stopped, main thread
detects it and start it again. Because process is started again, stop
method runned by monit httpd in parallel will fail => it will disable
monitoring (it affects common monit runtime environment). The result
is, that the service is running, but not monitored.
You can replicate the problem easily when you let monit run in daemon
mode and wakeup every 1s.
I think Oliver saw this problem and his patch has workaround for it,
which will work fine, but probably it will be better to solve the race
condition itself. What about it (would someone look on it or shall i
do it :) ?
Cheers,
Martin
_______________________________________________
monit-dev mailing list
address@hidden
http://mail.nongnu.org/mailman/listinfo/monit-dev
diff -Naur monit/http/cervlet.c monit.restart/http/cervlet.c
--- monit/http/cervlet.c 2002-12-30 19:33:53.000000000 +0100
+++ monit.restart/http/cervlet.c 2003-02-09 21:50:40.000000000 +0100
@@ -567,6 +567,23 @@
}
+ if( is(action, "restart") ) {
+
+ if(p->start && p->stop) {
+
+ check_process(name, "stop", FALSE);
+ check_process(name, "start", FALSE);
+
+ } else {
+
+ send_error(res, SC_BAD_REQUEST,
+ "Start or stop method not defined for the process");
+ goto quit;
+
+ }
+
+ }
+
if(is(action, "status")) {
print_status(p, res);
@@ -822,6 +839,12 @@
"<input type=hidden value='stop' name=action>"
"<input type=submit value='Stop program' style='font-size:
12pt'></font>"
"</form></td>", name);
+ if(p->start && p->stop)
+ out_print(res,
+ "<td><form method=GET action=/%s>"
+ "<input type=hidden value='restart' name=action>"
+ "<input type=submit value='Restart program' style='font-size:
12pt'></font>"
+ "</form></td>", name);
out_print(res, "</tr></table>");
FOOT