monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 4.0 showstopper?


From: Christian Hopp
Subject: Re: 4.0 showstopper?
Date: Thu, 18 Sep 2003 12:34:32 +0200 (CEST)

On Wed, 17 Sep 2003, Jan-Henrik Haukeland wrote:

> Christian Hopp <address@hidden> writes:
>
> > We should plan for the next release to find a way to prevent
> > possible race conditions by making all those function reentrant like
> > it done for sendmail.c.  Now preventing global data is more
> > important than before.
>
> Yes and no. OK, we have added a new wait_start() thread which
> complicate stuff, but I absolutely belive that previous version of
> monit will have many of the same race condition problems with
> signals. But, yes, absolutely, global data must be protected, actually
> the best thing to do is to have *no* global data at all and instead
> pass shared data as parameters to functions, this way you will have a
> reentrant system and probably a thread-safe system.

But we can target for the next to do a more preventive protection then
a fixing protection for our data.

> > Or we might want to rethink the whole multithreadded aspect in a
> > highly critcal application like monit.  => Let me say it this:
> >
> >   Is multithreaded code anyhow appropriate for system critical
> >   applications?
>
> According to Alan Cox, "threads are for those that cannot program a
> state machine". That might be true, at least on a one-cpu system. But
> programming a state machine is hard and it's much easier to design and
> program a system with threads. For instance it's easy to start monit's
> httpd server in it's own thread. It cannot run in it's own process,
> because you will lose important shared information, such as the
> servicelist content. (I tried that in the 2.4 or 2.5 monit version and
> it didn't work well). (Shared memory is an option but ugh, no-way
> hosay :)

You are right. I just wanted to bring this provocative question or
thought to think about the consequences of our, actually really
successful, approach.

(-:

> One option could be to rewrite monit to run in a big while-loop and do
> monitoring *and* http in the same process.
>
> while(true) {
>
>  do_monitor(); // Call validate
>  do_httpd(); // accept, parse and respond
>
> }

I think the httpd is the least problem here.  It does mostly reporting
work and any interactive work can be protected by mutices.

But we should think about "serializing" some other processes.  Maybe
it would be useful to "detach" (by thread) once the sendmails stuff and
queue the mails.  E.g., any failure is just added to this list.  And
any mail is processed one after another.  Maybe this can simplify the
data paths.

There might be also another thing.  I thought of giving monit a
watchdog thread.  Any "permanent" thread has to reset a semaphore in a
specific time frame (may be the loop time) and in case this doesn't
happen either the thread is restarted or monit has to restart (or
stop) itself.

> But summa summarum, I think we are better off by using threads. We
> must only rememeber do it right :) Besides many of the latest problems
> (except sendmail) was not directly related to threads but to signals.

Sure.  But it helps if we discuss this from time to time.  And with
signals you do usually have the same aspects to take care of than with
threads.

This is also a matter of discussion... I don't really like the
siglongjump construction... we made it safe in sendmail now but I
would always hesitate to use it.  As the man page says... "longjmp()
and siglongjmp() make programs hard to understand and maintain.  If
possible an alternative should be used."


CHopp

-- 
Christian Hopp                                email: address@hidden
Institut für Elektrische Informationstechnik             fon: +49-5323-72-2113
TU Clausthal, Leibnizstr. 28, 38678 Clausthal-Zellerf.   fax: +49-5323-72-3197
                             pgpkey: https://www.iei.tu-clausthal.de/pgp-keys/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]