[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: <service> start Generates email noise
From: |
Aaron Scamehorn |
Subject: |
RE: <service> start Generates email noise |
Date: |
Thu, 21 Dec 2006 13:13:21 -0600 |
Glad to hear it! Glad to help.
Also, thanks for pointing out the "every" directive. I think I might
want to start using it...
Thanks,
Aaron
-----Original Message-----
From: address@hidden
[mailto:address@hidden On
Behalf Of Martin Pala
Sent: Thursday, December 21, 2006 5:21 AM
To: The monit developer list
Subject: Re: <service> start Generates email noise
Thanks :)
I have reproduced the problem - it is fixed in cvs no. It was caused in
the validate.c:check_skip() by the order of s->def_every vs. s->visited
tests, correct order is:
--8<--
if(s->visited) {
DEBUG("'%s' check skipped -- service already handled "
"in a dependency chain\n", s->name);
return TRUE;
}
if(!s->def_every)
return FALSE;
--8<--
When there was no 'every' statement used, the check_skip() was FALSE and
monit performed the service check in the same cycle where the
user-requested start action was called. Because the just-started process
was not running yet, the process existence test failed and monit
performed the restart action in the same cycle as well.
I had problems to reproduce it, since i used 'every' statement in the
testing configuration, which masked the bug.
Thanks for help :)
Martin
Aaron Scamehorn wrote:
> Hi Martin,
>
> Thanks for the detailed description...
>
> I've attached my monitrc file. Obviously the executable (logClient)
> is an in-house exe, but that shouldn't matter, should it?
>
> I wonder if it is due to the amount of time it takes for the exe to
> update it's pid file...
>
> Seems as though it has something to do with the wait_start starting
> it's own thread to wait???
>
> The scenario is rather simple. I can reproduce by stopping the
> service, then issuing a monit <service> start via the CLI.
>
> If this is not enough detail, or I can help out more, please let me
> know.
>
> Thanks,
> Aaron
>
>
> -----Original Message-----
> From: address@hidden
> [mailto:address@hidden On
> Behalf Of Martin Pala
> Sent: Wednesday, December 06, 2006 8:50 AM
> To: The monit developer list
> Subject: Re: <service> start Generates email noise
>
> I have looked on it ...
>
> I will first explain how it works in monit 4.8.2:
>
> Two threads come into play:
> - http thread
> - monitoring thread
>
> The http thread process the user requested actions (posted either
> using CLI or HTML interface). The action to be done is scheduled in
> http/cervlet.c:handle_action() via setting of the s->doaction flag for
> the appropriate service. When there is no action scheduled, the
> s->doaction flag is set to ACTION_IGNORE (in p.y during service
> initialization or in validate.c after it was handled). In addition the
> Run.doaction is set to TRUE just to signalize that there is some
> scheduled action in the service tree. The main monitoring thread is
> then
>
> wake up by http thread to speedup the action handling.
>
> The main thread then in validate.c:validate() checks whether the
> Run.doaction flag is set, since the user actions are preferred. In the
> case that it is set, it walks the service tree and for each service
> performs the scheduled s->doaction using control_service() and then
> resets the s->doaction flag to ACTION_IGNORE. This is all done under
> mutex and signal protection, so it cannot be interrupted nor race
> condition can occure. The only thread which can call control_service
> and
>
> physicaly start/restart/etc. the service is the main thread. The
> control_service also sets the s->visited flag.
>
> The second service loop is then evaluated - monit walks the service
> tree, for each service locks mutex and blocks signals. In the case
> that the service was not handled in the same cycle already (s->visited
> flag is compared in the check_skip) it checks the s->doaction flag
> again (to improve the response time for the services, which has
> scheduled action in between the first and second loop in the same
> cycle). In the case that it is set, it performs the action, otherwise
it checks the service.
>
>
> The design is similar to signal handling. The http thread just sets
> the flag, whereas the monitoring thread handle the action. From theory
> point
>
> of view, i think no race condition could occure.
>
> I tried to reproduce the problem (official monit-4.8.2 release)
> without success.
>
> Can you prepare simple monit configuration and procedure for problem
> reproduction?
>
> Thanks,
> Martin
>
>
>
> Aaron Scamehorn wrote:
>
>>Hi Martin,
>>
>>Actually I think you've now got one thread doing an ACTION_START, and
>>another doing an ACTION_RESTART on the exact same service.
>>
>>It is the ACTION_RESTART that is generating what I perceived to be
>>extraneous emails.
>>
>>It looks like the do_wakeupcall that you added to
>>http/cervlet.c:handle_action() is the culprit. Without it, I don't
>
> get
>
>>the ACTION_RESTART problem.
>>
>>Of course you need this now, or else it takes Poll Time to actully
>>respond to the HTTP events, which is what you were trying to speed up
>
> in
>
>>the first place.
>>
>>Here is the log output, with a bunch of extra messages, including
>>pthread_t.
>>
>>3086927552 [CST Dec 1 14:53:25] debug : 'data_dir' filesystem
>
> flags
>
>>has not changed since last cycle
>>3086927552 [CST Dec 1 14:53:25] debug : 'data_dir' space usage
>
> check
>
>>passed [current space usage=10.6%]
>>3086924720 [CST Dec 1 14:53:26] info : monit daemon at 24175
>>awakened
>>3086927552 [CST Dec 1 14:53:26] info : Awakened by User defined
>>signal 1
>>3086927552 [CST Dec 1 14:53:26] debug : control_service:
>>ACTION_START for 'LogClient'
>>3086927552 [CST Dec 1 14:53:26] debug : control_service:
>>ACTION_START Util_isProcessRunning for 'LogClient'
>>3086927552 [CST Dec 1 14:53:26] debug : 'LogClient' Error testing
>>process id [24220] -- No such process
>>3086927552 [CST Dec 1 14:53:26] debug : do_start:
>>Util_isProcessRunning for 'LogClient'
>>3086927552 [CST Dec 1 14:53:26] debug : 'LogClient' Error testing
>>process id [24220] -- No such process
>>3086927552 [CST Dec 1 14:53:26] info : 'LogClient' start:
>>/cogcap/ccts/bin/logclnt
>>3086927552 [CST Dec 1 14:53:26] debug : 'LogClient' Error testing
>>process id [24220] -- No such process
>>3086927552 [CST Dec 1 14:53:26] debug : Monitoring enabled --
>>service LogClient
>>3086927552 [CST Dec 1 14:53:26] debug : check_process: calling
>>Util_isProcessRunning for 'LogClient'
>>3086927552 [CST Dec 1 14:53:26] debug : 'LogClient' Error testing
>>process id [24220] -- No such process
>>3086927552 [CST Dec 1 14:53:26] error : 'LogClient' process is not
>>running
>>3086927552 [CST Dec 1 14:53:26] debug : Does not exist
>
> notification
>
>>is NOT sent to address@hidden
>>3086927552 [CST Dec 1 14:53:26] debug : Does not exist
>
> notification
>
>>is sent to address@hidden
>>3076434864 [CST Dec 1 14:53:26] debug : static void* wait_start
>
> for
>
>>'LogClient'
>>3076434864 [CST Dec 1 14:53:26] debug : 1) wait_start: calling
>>Util_isProcessRunning for 'LogClient', max_tries= 29
>>3076434864 [CST Dec 1 14:53:26] debug : 'LogClient' Error testing
>>process id [24220] -- No such process
>>3086927552 [CST Dec 1 14:53:26] debug : control_service:
>>ACTION_RESTART for 'LogClient'
>>3086927552 [CST Dec 1 14:53:26] info : 'LogClient' trying to
>>restart
>>3086927552 [CST Dec 1 14:53:26] debug : Monitoring disabled --
>>service LogClient (stop)
>>3086927552 [CST Dec 1 14:53:26] debug : do_stop:
>>Util_isProcessRunning for 'LogClient'
>>3086927552 [CST Dec 1 14:53:26] debug : 'LogClient' Error testing
>>process id [24220] -- No such process
>>3086927552 [CST Dec 1 14:53:26] debug : 'data_dir' filesystem
>
> flags
>
>>has not changed since last cycle
>>3086927552 [CST Dec 1 14:53:26] debug : 'data_dir' space usage
>
> check
>
>>passed [current space usage=10.6%]
>>3076434864 [CST Dec 1 14:53:27] debug : 1) wait_start: calling
>>Util_isProcessRunning for 'LogClient', max_tries= 28
>>3076434864 [CST Dec 1 14:53:27] debug : 2) wait_start: calling
>>Util_isProcessRunning for 'LogClient'
>>3086927552 [CST Dec 1 14:53:56] debug : check_process: calling
>>Util_isProcessRunning for 'LogClient'
>>3086927552 [CST Dec 1 14:53:56] info : 'LogClient' process is
>>running with pid 24375
>>3086927552 [CST Dec 1 14:53:56] debug : Exists notification is NOT
>>sent to address@hidden
>>3086927552 [CST Dec 1 14:53:56] debug : Exists notification is
>
> sent
>
>>to address@hidden
>>3086927552 [CST Dec 1 14:53:56] debug : 'LogClient' zombie check
>>passed [status_flag=0000]
>>3086927552 [CST Dec 1 14:53:56] debug : 'LogClient' loadavg(5min)
>>check passed [current loadavg(5min)=0.2]
>>3086927552 [CST Dec 1 14:53:56] debug : 'LogClient' cpu usage
>
> check
>
>>passed [current cpu usage=0.0%]
>>3086927552 [CST Dec 1 14:53:56] debug : 'LogClient' mem amount
>
> check
>
>>passed [current mem amount=2764kB]
>>3086927552 [CST Dec 1 14:53:56] debug : 'data_dir' filesystem
>
> flags
>
>>has not changed since last cycle
>>3086927552 [CST Dec 1 14:53:56] debug : 'data_dir' space usage
>
> check
>
>>passed [current space usage=10.6%]
>>
>>
>>-----Original Message-----
>>From: address@hidden
>>[mailto:address@hidden On
>>Behalf Of Martin Pala
>>Sent: Thursday, November 30, 2006 4:20 PM
>>To: The monit developer list
>>Subject: Re: <service> start Generates email noise
>>
>>Hello,
>>
>>this behavior isn't bug - the 'nonexist' event type has possitive and
>>negative variants:
>>
>> Does not exist (positive 'nonexist')
>>
>> vs.
>>
>> Exists (negative 'nonexist')
>>
>>The alert statement allows to filter just the general event type, not
>>the particular polarity (there is no 'exist' option).
>>
>>=> when you have registered the 'nonexist' event, you should get two
>>alerts informing about the beggining and end of the problem.
>>
>>Martin
>>
>>
>>Aaron Scamehorn wrote:
>>
>>>Hello,
>>>
>>> From version 4.8 to 4.8.2, the following bug has been introduced:
>>>
>>>When we issue a monit <service> start command, we get "Does not
>
> exist"
>
>>>and a corresponding "Exists" emails.
>>>
>>>Here is the debug output showing this behavior in 4.8.2:
>>>'LogClient' Error testing process id [11034] -- No such process
>>>'LogClient' Error testing process id [11034] -- No such process
>>>'LogClient' start: /cogcap/ccts/bin/logclnt 'LogClient' Error testing
>>>process id [11034] -- No such process Monitoring enabled -- service
>>>LogClient 'LogClient' Error testing process id [11034] -- No such
>>>process 'LogClient' process is not running Does not exist
>>>notification is sent to address@hidden 'LogClient' Error
>>>testing process id [11034] -- No such process 'LogClient' trying to
>>>restart Monitoring disabled -- service LogClient (stop) 'LogClient'
>>>Error testing process id [11034] -- No such process 'LogClient'
>>>process is running with pid 11189 Exists notification is sent to
>>>address@hidden 'LogClient' zombie check passed
>>>[status_flag=0000] 'LogClient' loadavg(5min) check passed [current
>>>loadavg(5min)=0.2] 'LogClient' cpu usage check passed [current cpu
>>>usage=0.0%] 'LogClient' mem amount check passed [current mem
>>>amount=2776kB]
>>>
>>>
>>>Under version 4.8, we don't get the annoying "Does not exist" and a
>>>corresponding "Exists" emails:
>>>
>>>'LogClient' Error testing process id [10970] -- No such process
>>>'LogClient' Error testing process id [10970] -- No such process
>>>'LogClient' start: /cogcap/ccts/bin/logclnt 'LogClient' Error testing
>>>process id [10970] -- No such process Monitoring enabled -- service
>>>LogClient 'LogClient' Error testing process id [10970] -- No such
>>>process 'LogClient' Error testing process id [10970] -- No such
>>>process 'LogClient' zombie check passed [status_flag=0000]
>>>'LogClient' loadavg(5min) check passed [current loadavg(5min)=0.1]
>>>'LogClient' cpu usage check passed [current cpu usage=0.0%]
>>>'LogClient' mem amount check passed [current mem amount=2776kB]
>>>
>>>
>>>
>>>Additionally, in our config file, we have the following set:
>>>set alert address@hidden only on { nonexist, exec, connection
>
> }
>
>>>We shouldn't be getting an "Exists" email under any circumstance,
>>
>>should
>>
>>>we?
>>>
>>>Thanks,
>>>Aaron
>>>
>>>
>>>
>>
> ----------------------------------------------------------------------
> --
>
>>>_______________________________________________
>>>monit-dev mailing list
>>>address@hidden
>>>http://lists.nongnu.org/mailman/listinfo/monit-dev
>>
>>
>>_______________________________________________
>>monit-dev mailing list
>>address@hidden
>>http://lists.nongnu.org/mailman/listinfo/monit-dev
>>
>>
>>_______________________________________________
>>monit-dev mailing list
>>address@hidden
>>http://lists.nongnu.org/mailman/listinfo/monit-dev
>
>
>
> _______________________________________________
> monit-dev mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/monit-dev
>
>
> ----------------------------------------------------------------------
> --
>
> _______________________________________________
> monit-dev mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/monit-dev
_______________________________________________
monit-dev mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/monit-dev