monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [monit-dev] Startup problems for a monit service


From: Aaron Scamehorn
Subject: RE: [monit-dev] Startup problems for a monit service
Date: Fri, 21 Aug 2009 07:38:38 -0500

Hi Martin,

Thanks for clearing that up.  I was confused about the start delay setting.

I did run monit w/ the -v option, and it does appear to be the behavior that you describe.

It looks like the pidfile is not updated within 30 seconds, and monit thinks it failed.

I'll move the timeout option to the daemon line as you recommend.

Thanks for your help.

Aaron

-----Original Message-----
From: address@hidden on behalf of Martin Pala
Sent: Thu 8/20/2009 3:43 PM
To: The monit developer list
Subject: Re: [monit-dev] Startup problems for a monit service

Hi Aaron,

regarding the start delay ... this is option of "set daemon" statement 
and sets start delay of monit itself - i.e. when monit starts, it wait 
60s before starting service verification. You can set the start 
timeout this way:

   start program = "/cogcap/ccts/bin/mdService start" with timeout 60 
seconds


Regarding the "process is not running" message - it is possible that 
if your process was slow starting, it didn't updated the pidfile 
within 30s so monit though that it didn't started (which is true at 
that point in time). If you'll set start timeout, it should fix the 
problem. To debug the -v option will provide more info.

Martin



On Aug 19, 2009, at 2:43 PM, Aaron Scamehorn wrote:

> Hello,
>
> This is monit version 5.0
> I'm having difficulty with one of our applications that I have monit 
> setup to monitor.
>
> The pertinant config is below:
> # Monit Config file for Magneto
> set daemon 10 with start delay 60 # Poll at 10-second intervals
> set statefile /tmp/monit.state
> check process mdService
>     with pidfile "/cogcap/ccts/var/run/mdService.magneto.pid"
>     start program = "/cogcap/ccts/bin/mdService start"
>     stop program = "/cogcap/ccts/bin/mdService stop"
>     if 10 restarts within 11 cycles then timeout
>     if mem > 256 Mb then alert
>     if cpu usage > 95% for 11 cycles then restart
>     #if failed port 9998 then restart
>     group base
> It is a slow to start app, which is why I've commented out the port 
> monitoring, and added the start delay of 60.
>
>
> From /var/log/messages, I can see monit tries to start the process 
> at 00:15:01, and at 00:15:31 issues a failed to start.
>
> At 00:15:47 I see monit now tries a restart, and a fail at 00:16:17.
>
> This repeats 2 more times.
>
> What ends up happening is I'm left with 4 processes running, because 
> none of them actually failed to start.
>
> So, first question is why does monit issue the first failure after 
> only 30 seconds if my start delay is 60?
> How does monit determine that the startup was a failure?  I'm 
> certain that the pid file is in place and contains the correct pid.
>
> I guess my next step is to run monit -v?
>
> Any help would be appreciated.
>
> Thanks,
> Aaron
>
>
> Aug 19 00:15:01 magneto monit[7908]: 'mdService' start: /cogcap/ccts/
> bin/mdService
> Aug 19 00:15:31 magneto monit[7908]: 'mdService' failed to start
> Aug 19 00:15:36 magneto monit[7908]: 'mdService' start action done
> Aug 19 00:15:47 magneto monit[7908]: 'mdService' process is not 
> running
> Aug 19 00:15:47 magneto monit[7908]: 'mdService' trying to restart
> Aug 19 00:15:47 magneto monit[7908]: 'mdService' start: /cogcap/ccts/
> bin/mdService
> Aug 19 00:16:17 magneto monit[7908]: 'mdService' failed to start
> Aug 19 00:16:27 magneto monit[7908]: 'mdService' process is not 
> running
> Aug 19 00:16:27 magneto monit[7908]: 'mdService' trying to restart
> Aug 19 00:16:27 magneto monit[7908]: 'mdService' start: /cogcap/ccts/
> bin/mdService
> Aug 19 00:16:58 magneto monit[7908]: 'mdService' failed to start
> Aug 19 00:17:08 magneto monit[7908]: 'mdService' process is not 
> running
> Aug 19 00:17:08 magneto monit[7908]: 'mdService' trying to restart
> Aug 19 00:17:08 magneto monit[7908]: 'mdService' start: /cogcap/ccts/
> bin/mdService
> Aug 19 00:17:38 magneto monit[7908]: 'mdService' failed to start
> Aug 19 00:17:48 magneto monit[7908]: 'mdService' process is not 
> running
> Aug 19 00:17:48 magneto monit[7908]: 'mdService' trying to restart
> Aug 19 00:17:48 magneto monit[7908]: 'mdService' start: /cogcap/ccts/
> bin/mdService
> Aug 19 00:18:05 magneto monit[7908]: 'mdService' started
> Aug 19 00:18:15 magneto monit[7908]: 'mdService' process is running 
> with pid 24160
> _______________________________________________
> monit-dev mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/monit-dev



reply via email to

[Prev in Thread] Current Thread [Next in Thread]