monit-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 4.0 showstopper?


From: Martin Pala
Subject: Re: 4.0 showstopper?
Date: Wed, 17 Sep 2003 20:13:35 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030908 Debian/1.4-4

Jan-Henrik Haukeland wrote:

Martin Pala <address@hidden> writes:

Jan-Henrik Haukeland wrote:

Martin Pala <address@hidden> writes:


during "lastminute" tests i saw following problem:

[CEST Sep 17 14:51:46] AssertException: at socket.c:333
aborting..

Oops!
I identified the problem - it is caused by race condition between
methods execution:

1.) monit detected that the process is not running and main monit
thread forks the start method (separate process, which will inherite
all filedescriptors)

2.) monit main thread creates new thread which waits for the service
to start - in the case that the service will not start (timeout will
occure), this thread posts timeout event, which causes alert and
continues by smtp server connection and sending the message. The
fildescriptor of socket opened to smtp server is shared between all
threads (wait_start and main)

3.) while wait_start is waiting for service to start, monit main
thread executed another validate cycle and detected that the process
is not running (independently of wait_start thread) - as usual, new
process inherites all open filedescriptors (including smtp server
socket fd)

I think this diagnosis is correct, except it involve the Socket_T
object used in sendmail. The reason; you got an Assert Exception for
the Socket_T object in socket.c:333 is probably because something like
the following occured:

THREAD1                THREAD2

initialize_server()
write++                initialize_serer()

finalize_server()
                      write++  <--- AssertException

finalize_server()
In this case, the problem is that the Socket_T object is a global
shared resource and the sendmail module will have problems when called
from more threads. I'll rewrite the sendmail.c file and fold in
initialize_serer() and finalize_server() into the sendmail() function,
this way the module is at least reentrant. I'm not so concerned about
the descriptors since (if I remember correctly) descriptors are dup'ed
on a fork().


You are rigth.

Problem still occures in sendmail() - it is caused exactly as you described. You are rigth about dup'ed filedescriptors too - the original method (don't care about FD_CLOEXEC) was better (more simple, secure and requires no mutex locking) => i will revert to it.

Thanks :)

Martin





reply via email to

[Prev in Thread] Current Thread [Next in Thread]