monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 4.0 showstopper?


From: Martin Pala
Subject: Re: 4.0 showstopper?
Date: Wed, 17 Sep 2003 22:30:51 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030908 Debian/1.4-4

Jan-Henrik Haukeland wrote:

Martin Pala <address@hidden> writes:

So, thanks to Hauk the problem with monit crashing was solved :)

It was teamwork, if you had not analyzed this I would not had tought
about sendmail :-)

The second (non-critical) reported problem remains:

--SNIP--
- synchronize main and wait_start thread to not check the service
which is in wait_start stage. This is standalone problem - monit can
try to start the service in paralel without realy waiting for service
to start.
--SNIP--

It is not dangerous now after sendmail() was fixed, but it is not
correct. I think we should skip the service check in the case that it
is in wait_start stage. I can look on it if you agree.

Since wait_start() only waits for Run.polltime time and there should
only be unique services in the monitrc file I think the possibility
for monit start a service in parallell is microscopic, unless I missed
something?

You are rigth, but i think this race condition can occure in the case that:

- there are more then one monitored services which uses the same start method - or there are more then one test inside monitored service which uses the same start method

The first case (more then one service using the same start method) seems like configuration error (dependency can be used), but the second case can occure in praxis, for example:

 check process myprocess with pidfile /var/run/myprocess.pid
   start program = "/etc/init.d/myprocess start"
   stop program = "/etc/init.d/myprocess stop"
   if failed port 80 then restart
   if failed port 443 then restart

Monit will test all ports regardless of the particular result. In the case that the first will fail, monit will call stop and start methods via restart event and continue the testing immediately. In the (special) case, that the start method is slow, it can collide with second test, which will involve restart event too => the two restart (stop and start) methods will "figth" and the result is unpredictable.

This is just theory based on lookup to the code - maybe i'm wrong.

Martin







reply via email to

[Prev in Thread] Current Thread [Next in Thread]