monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Features...


From: Jan-Henrik Haukeland
Subject: Re: Features...
Date: Fri, 26 Sep 2003 17:14:59 +0200
User-agent: Gnus/5.1002 (Gnus v5.10.2) XEmacs/21.4 (Reasonable Discussion, linux)

Christian Hopp <address@hidden> writes:

> clean_sems()
> start_validate()
> start_httpd()
>
> while TRUE
>       sleep (cycletime) (or select, cycletime>max_expected_cycle)
>         if (! sem_validate )
>               restart_validate()
>       sem_validate= FALSE
>
>       if (run_httpd && ! sem_httpd )
>               restart_httpd()
>       sem_httpd= FALSE
> end

Okay, this clarified a bit what you had in mind. It's not a bad idea
on paper, by all means, but still there are fundamental problems here.

1) First, I still think that this is a way to try and fix something
   that is wrong and should be fixed in the code.

2) You cannot use sleep (cycletime) in the watchdog thread. This is
   to uncertain; the validate thread can have a lot of work to do and
   it can take time before it can set the semaphore. Blindely
   restarting the validate thread without knowing if it's really
   running or not is not a good solution. Another and much bigger
   problem, since the validate thread can run fast and slow depending
   on how much work it has to do, the time it will set the semaphore
   will vary. So it will set the semaphore at cycletime+X where X is
   the time it takes the validate thread to run. Think of this as
   timezones; The watchdog thread is located at GMT and says that it
   should take the validate thread 80 days (cycletime) to travle
   around the world. The problem is that the validate thread must
   sometime run by boat or sometime by an airplane and sometime it
   must simply wait for a connection. In other words; in one cycle it
   will use 60 days to travle aorund the world and in another cycle it
   will use 120 days. See the problem? Because the validate thread
   cannot and will not run with constant time you *cannot* use
   constant time to check if it's running. 

3) If the validate thread and http thread should be hung up in a mutex
   deadlock situation, simply calling restart_xxx() will hang as well
   and the watchdog thread will just fill up the call stack with
   restart_xxx() until SIGSEGV.

4) There are many more situations that can go wrong, just give me time
   to think them up if you are not convinced already :-)

-- 
Jan-Henrik Haukeland




reply via email to

[Prev in Thread] Current Thread [Next in Thread]