help-cfengine
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cfagent hangs


From: Luke A. Kanies
Subject: Re: cfagent hangs
Date: Thu, 4 Dec 2003 12:26:47 -0600 (CST)

On Mon, 24 Nov 2003, Jeff Wasilko wrote:

> Hi:
>
> I've been having problems with cfagent hanging for multiple days.
> It's usually started by some sort of network problem (we've had a
> bit of instability here that we've traced down to a failing gigE
> switch).
>
> cfagent is started by cfexecd. Is there any way to get cfexec to
> kill the wedged cfagent?
>
> lexx 7 ># ps -ef | grep cfagent
>     root 17435   375  0   Nov 22 ?     0:04 
> /is/local/state/cfengine/bin/cfagent
>
> lexx 8 ># truss -p 17435
> recv(8, 0xFFBF2618, 8, 0)       (sleeping...)
>
> It seems to be hung in a copy of a big tree (pushing out our
> /usr/local equivilent):
>
> This is the mail I got from cfengine when I killed the hung
> cfagent:
>
> cfengine:lexx: Received signal 15 (SIGTERM) while doing
> [lock.cfagent_conf.lexx.copy._is_dist_pkg__is_dist_pkg]
> cfengine:lexx: Logical start time Sat Nov 22 16:20:34 2003
> cfengine:lexx: This sub-task started really at Sat Nov 22 16:20:34 2003

[obviously, I'm catching up on email]

I had a problem similar to this.  It was somehow related to a bad compile
of cfengine and BerkeleyDB; I don't know what went wrong, but eventually
cfagent would hang forever on trying to make locks in the lock_db file.
And I mean forever; I'm talking fork bomb.

It would be nice if cfexecd were configurable to kill child processes
after a certain amount of time; I would settle for a hard-coded value, but
a configurable one would be best.  I think an hour is reasonable, but four
might be better for the general case.

This was also version 2.0.8p1, but like I said, it was a bad compile.  We
recompiled against 4.0.14 or something and it worked fine.  And this was
only on AIX.  I also had to go back and delete every db file on every
machine with this problem, as they were all irretrievably corrupt,
apparently.

-- 
"But these [serious NT security flaws] are not inherent flaws in the
operating system -- they don't happen by accident. They are the result
of deliberate and well-thought-out efforts." --Mike Nash, Microsoft.
The _flaws_ are deliberate?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]