monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: <service> start Generates email noise


From: Martin Pala
Subject: Re: <service> start Generates email noise
Date: Wed, 06 Dec 2006 15:50:19 +0100
User-agent: Thunderbird 1.5.0.8 (X11/20061109)

I have looked on it ...

I will first explain how it works in monit 4.8.2:

Two threads come into play:
- http thread
- monitoring thread

The http thread process the user requested actions (posted either using CLI or HTML interface). The action to be done is scheduled in http/cervlet.c:handle_action() via setting of the s->doaction flag for the appropriate service. When there is no action scheduled, the s->doaction flag is set to ACTION_IGNORE (in p.y during service initialization or in validate.c after it was handled). In addition the Run.doaction is set to TRUE just to signalize that there is some scheduled action in the service tree. The main monitoring thread is then wake up by http thread to speedup the action handling.

The main thread then in validate.c:validate() checks whether the Run.doaction flag is set, since the user actions are preferred. In the case that it is set, it walks the service tree and for each service performs the scheduled s->doaction using control_service() and then resets the s->doaction flag to ACTION_IGNORE. This is all done under mutex and signal protection, so it cannot be interrupted nor race condition can occure. The only thread which can call control_service and physicaly start/restart/etc. the service is the main thread. The control_service also sets the s->visited flag.

The second service loop is then evaluated - monit walks the service tree, for each service locks mutex and blocks signals. In the case that the service was not handled in the same cycle already (s->visited flag is compared in the check_skip) it checks the s->doaction flag again (to improve the response time for the services, which has scheduled action in between the first and second loop in the same cycle). In the case that it is set, it performs the action, otherwise it checks the service.


The design is similar to signal handling. The http thread just sets the flag, whereas the monitoring thread handle the action. From theory point of view, i think no race condition could occure.

I tried to reproduce the problem (official monit-4.8.2 release) without success.

Can you prepare simple monit configuration and procedure for problem reproduction?

Thanks,
Martin



Aaron Scamehorn wrote:
Hi Martin,

Actually I think you've now got one thread doing an ACTION_START, and
another doing an ACTION_RESTART on the exact same service.

It is the ACTION_RESTART that is generating what I perceived to be
extraneous emails.
It looks like the do_wakeupcall that you added to
http/cervlet.c:handle_action() is the culprit.  Without it, I don't get
the ACTION_RESTART problem.

Of course you need this now, or else it takes Poll Time to actully
respond to the HTTP events, which is what you were trying to speed up in
the first place.

Here is the log output, with a bunch of extra messages, including
pthread_t.

3086927552 [CST Dec  1 14:53:25] debug    : 'data_dir' filesystem flags
has not changed since last cycle
3086927552 [CST Dec  1 14:53:25] debug    : 'data_dir' space usage check
passed [current space usage=10.6%]
3086924720 [CST Dec  1 14:53:26] info     : monit daemon at 24175
awakened
3086927552 [CST Dec  1 14:53:26] info     : Awakened by User defined
signal 1
3086927552 [CST Dec  1 14:53:26] debug    : control_service:
ACTION_START for 'LogClient'
3086927552 [CST Dec  1 14:53:26] debug    : control_service:
ACTION_START Util_isProcessRunning for 'LogClient'
3086927552 [CST Dec  1 14:53:26] debug    : 'LogClient' Error testing
process id [24220] -- No such process
3086927552 [CST Dec  1 14:53:26] debug    : do_start:
Util_isProcessRunning for 'LogClient'
3086927552 [CST Dec  1 14:53:26] debug    : 'LogClient' Error testing
process id [24220] -- No such process
3086927552 [CST Dec  1 14:53:26] info     : 'LogClient' start:
/cogcap/ccts/bin/logclnt
3086927552 [CST Dec  1 14:53:26] debug    : 'LogClient' Error testing
process id [24220] -- No such process
3086927552 [CST Dec  1 14:53:26] debug    : Monitoring enabled --
service LogClient
3086927552 [CST Dec  1 14:53:26] debug    : check_process: calling
Util_isProcessRunning for 'LogClient'
3086927552 [CST Dec  1 14:53:26] debug    : 'LogClient' Error testing
process id [24220] -- No such process
3086927552 [CST Dec  1 14:53:26] error    : 'LogClient' process is not
running
3086927552 [CST Dec  1 14:53:26] debug    : Does not exist notification
is NOT sent to address@hidden
3086927552 [CST Dec  1 14:53:26] debug    : Does not exist notification
is sent to address@hidden
3076434864 [CST Dec  1 14:53:26] debug    : static void* wait_start for
'LogClient'
3076434864 [CST Dec  1 14:53:26] debug    : 1) wait_start: calling
Util_isProcessRunning for 'LogClient', max_tries= 29
3076434864 [CST Dec  1 14:53:26] debug    : 'LogClient' Error testing
process id [24220] -- No such process
3086927552 [CST Dec  1 14:53:26] debug    : control_service:
ACTION_RESTART for 'LogClient'
3086927552 [CST Dec  1 14:53:26] info     : 'LogClient' trying to
restart
3086927552 [CST Dec  1 14:53:26] debug    : Monitoring disabled --
service LogClient (stop)
3086927552 [CST Dec  1 14:53:26] debug    : do_stop:
Util_isProcessRunning for 'LogClient'
3086927552 [CST Dec  1 14:53:26] debug    : 'LogClient' Error testing
process id [24220] -- No such process
3086927552 [CST Dec  1 14:53:26] debug    : 'data_dir' filesystem flags
has not changed since last cycle
3086927552 [CST Dec  1 14:53:26] debug    : 'data_dir' space usage check
passed [current space usage=10.6%]
3076434864 [CST Dec  1 14:53:27] debug    : 1) wait_start: calling
Util_isProcessRunning for 'LogClient', max_tries= 28
3076434864 [CST Dec  1 14:53:27] debug    : 2) wait_start: calling
Util_isProcessRunning for 'LogClient'
3086927552 [CST Dec  1 14:53:56] debug    : check_process: calling
Util_isProcessRunning for 'LogClient'
3086927552 [CST Dec  1 14:53:56] info     : 'LogClient' process is
running with pid 24375
3086927552 [CST Dec  1 14:53:56] debug    : Exists notification is NOT
sent to address@hidden
3086927552 [CST Dec  1 14:53:56] debug    : Exists notification is sent
to address@hidden
3086927552 [CST Dec  1 14:53:56] debug    : 'LogClient' zombie check
passed [status_flag=0000]
3086927552 [CST Dec  1 14:53:56] debug    : 'LogClient' loadavg(5min)
check passed [current loadavg(5min)=0.2]
3086927552 [CST Dec  1 14:53:56] debug    : 'LogClient' cpu usage check
passed [current cpu usage=0.0%]
3086927552 [CST Dec  1 14:53:56] debug    : 'LogClient' mem amount check
passed [current mem amount=2764kB]
3086927552 [CST Dec  1 14:53:56] debug    : 'data_dir' filesystem flags
has not changed since last cycle
3086927552 [CST Dec  1 14:53:56] debug    : 'data_dir' space usage check
passed [current space usage=10.6%]


-----Original Message-----
From: address@hidden
[mailto:address@hidden On
Behalf Of Martin Pala
Sent: Thursday, November 30, 2006 4:20 PM
To: The monit developer list
Subject: Re: <service> start Generates email noise

Hello,

this behavior isn't bug - the 'nonexist' event type has possitive and negative variants:

   Does not exist (positive 'nonexist')

     vs.

   Exists (negative 'nonexist')

The alert statement allows to filter just the general event type, not the particular polarity (there is no 'exist' option).

=> when you have registered the 'nonexist' event, you should get two alerts informing about the beggining and end of the problem.

Martin


Aaron Scamehorn wrote:
Hello,

 From version 4.8 to 4.8.2, the following bug has been introduced:

When we issue a monit <service> start command, we get "Does not exist"

and a corresponding "Exists" emails.

Here is the debug output showing this behavior in 4.8.2:
'LogClient' Error testing process id [11034] -- No such process
'LogClient' Error testing process id [11034] -- No such process
'LogClient' start: /cogcap/ccts/bin/logclnt
'LogClient' Error testing process id [11034] -- No such process
Monitoring enabled -- service LogClient
'LogClient' Error testing process id [11034] -- No such process
'LogClient' process is not running
Does not exist notification is sent to address@hidden
'LogClient' Error testing process id [11034] -- No such process
'LogClient' trying to restart
Monitoring disabled -- service LogClient (stop)
'LogClient' Error testing process id [11034] -- No such process
'LogClient' process is running with pid 11189
Exists notification is sent to address@hidden
'LogClient' zombie check passed [status_flag=0000]
'LogClient' loadavg(5min) check passed [current loadavg(5min)=0.2]
'LogClient' cpu usage check passed [current cpu usage=0.0%]
'LogClient' mem amount check passed [current mem amount=2776kB]


Under version 4.8, we don't get the annoying "Does not exist" and a corresponding "Exists" emails:

'LogClient' Error testing process id [10970] -- No such process
'LogClient' Error testing process id [10970] -- No such process
'LogClient' start: /cogcap/ccts/bin/logclnt
'LogClient' Error testing process id [10970] -- No such process
Monitoring enabled -- service LogClient
'LogClient' Error testing process id [10970] -- No such process
'LogClient' Error testing process id [10970] -- No such process
'LogClient' zombie check passed [status_flag=0000]
'LogClient' loadavg(5min) check passed [current loadavg(5min)=0.1]
'LogClient' cpu usage check passed [current cpu usage=0.0%]
'LogClient' mem amount check passed [current mem amount=2776kB]



Additionally, in our config file, we have the following set:
set alert address@hidden only on { nonexist, exec, connection }

We shouldn't be getting an "Exists" email under any circumstance,
should
we?

Thanks,
Aaron



------------------------------------------------------------------------
_______________________________________________
monit-dev mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/monit-dev


_______________________________________________
monit-dev mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/monit-dev


_______________________________________________
monit-dev mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/monit-dev




reply via email to

[Prev in Thread] Current Thread [Next in Thread]