monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: monit restart bug/race condition (3.1 and current CVS version behavi


From: Martin Pala
Subject: Re: monit restart bug/race condition (3.1 and current CVS version behavior)
Date: Sun, 09 Feb 2003 22:00:53 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021226 Debian/1.2.1-9

I looked for the cause of it - there are two problems:

1.) described race condition
2.) the real reason why monit in daemon mode stops monitoring is, that do_restart() method sets the state of toggle_validate_flag as usual, but it is not used in subsequent http actions. Because it is based on stop command call, monit httpd calls:

check_process(name, action, TRUE);
                                             ^^^^^^

which toggles do_validate flag. It isn't regarded to race condition, but makes the problem worse. I think that problem 2.) could be easily solved by implementing 'restart' method in monit http. I attached patch for it - it isn't yet complete - it is needed to modify control.c methods to call 'restart' action. I will yet look on it (but i must now leave the computer, otherwise my wife will kill me ;)

Martin

Martin

Martin Pala wrote:

Hi,

there's bug in monit since 3.1 - if monit is running in daemon mode and you want to restart the service by executing (for example):

unicorn:~#monit restart slapd

you get following output in monit's log:

Feb  9 19:03:25 unicorn monit[3157]: stop: (slapd) /etc/init.d/slapd
Feb  9 19:03:26 unicorn monit[3155]: 'slapd' have valid checksums
Feb  9 19:03:26 unicorn monit[3155]: start: (slapd) /etc/init.d/slapd
Feb 9 19:03:26 unicorn monit[3157]: monit: Warning process 'slapd' was not stopped
Feb  9 19:03:26 unicorn monit[3157]: Monitoring disabled -- process slapd

(Legend: 3155=monit main thread, 3157=monit httpd thread)

Since that, slapd isn't monitored anymore but it is running.

Problem is, that monit calls stop method via monit httpd and waits for process termination. As soon as process is stopped, main thread detects it and start it again. Because process is started again, stop method runned by monit httpd in parallel will fail => it will disable monitoring (it affects common monit runtime environment). The result is, that the service is running, but not monitored.

You can replicate the problem easily when you let monit run in daemon mode and wakeup every 1s.

I think Oliver saw this problem and his patch has workaround for it, which will work fine, but probably it will be better to solve the race condition itself. What about it (would someone look on it or shall i do it :) ?

Cheers,
Martin




_______________________________________________
monit-dev mailing list
address@hidden
http://mail.nongnu.org/mailman/listinfo/monit-dev


diff -Naur monit/http/cervlet.c monit.restart/http/cervlet.c
--- monit/http/cervlet.c        2002-12-30 19:33:53.000000000 +0100
+++ monit.restart/http/cervlet.c        2003-02-09 21:50:40.000000000 +0100
@@ -567,6 +567,23 @@
 
       }
        
+      if( is(action, "restart") ) {
+
+       if(p->start && p->stop) {
+
+          check_process(name, "stop", FALSE);
+         check_process(name, "start", FALSE);
+
+       } else {
+
+          send_error(res, SC_BAD_REQUEST,
+           "Start or stop method not defined for the process");
+         goto quit;
+
+       }
+
+      }
+       
       if(is(action, "status")) {
 
        print_status(p, res);
@@ -822,6 +839,12 @@
        "<input type=hidden value='stop' name=action>"
        "<input type=submit value='Stop program' style='font-size: 
12pt'></font>"
        "</form></td>", name);
+  if(p->start && p->stop)
+    out_print(res, 
+       "<td><form method=GET action=/%s>"
+       "<input type=hidden value='restart' name=action>"
+       "<input type=submit value='Restart program' style='font-size: 
12pt'></font>"
+       "</form></td>", name);
   out_print(res, "</tr></table>");
 
   FOOT

reply via email to

[Prev in Thread] Current Thread [Next in Thread]