monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 4.0 showstopper?


From: Jan-Henrik Haukeland
Subject: Re: 4.0 showstopper?
Date: Wed, 17 Sep 2003 19:34:29 +0200
User-agent: Gnus/5.1002 (Gnus v5.10.2) XEmacs/21.4 (Reasonable Discussion, linux)

Martin Pala <address@hidden> writes:

> Jan-Henrik Haukeland wrote:
>
>>Martin Pala <address@hidden> writes:
>>
>>
>>>during "lastminute" tests i saw following problem:
>>>
>>>[CEST Sep 17 14:51:46] AssertException: at socket.c:333
>>>aborting..
>>>
>>
>> Oops!
> I identified the problem - it is caused by race condition between
> methods execution:
>
> 1.) monit detected that the process is not running and main monit
> thread forks the start method (separate process, which will inherite
> all filedescriptors)
>
> 2.) monit main thread creates new thread which waits for the service
> to start - in the case that the service will not start (timeout will
> occure), this thread posts timeout event, which causes alert and
> continues by smtp server connection and sending the message. The
> fildescriptor of socket opened to smtp server is shared between all
> threads (wait_start and main)
>
> 3.) while wait_start is waiting for service to start, monit main
> thread executed another validate cycle and detected that the process
> is not running (independently of wait_start thread) - as usual, new
> process inherites all open filedescriptors (including smtp server
> socket fd)

I think this diagnosis is correct, except it involve the Socket_T
object used in sendmail. The reason; you got an Assert Exception for
the Socket_T object in socket.c:333 is probably because something like
the following occured:

THREAD1                THREAD2

initialize_server()    

write++                initialize_serer()

finalize_server()      

                       write++  <--- AssertException

                       finalize_server()      
                       

In this case, the problem is that the Socket_T object is a global
shared resource and the sendmail module will have problems when called
from more threads. I'll rewrite the sendmail.c file and fold in
initialize_serer() and finalize_server() into the sendmail() function,
this way the module is at least reentrant. I'm not so concerned about
the descriptors since (if I remember correctly) descriptors are dup'ed
on a fork().


-- 
Jan-Henrik Haukeland




reply via email to

[Prev in Thread] Current Thread [Next in Thread]