[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bootstrapping
From: |
Luke A. Kanies |
Subject: |
Re: Bootstrapping |
Date: |
Wed, 18 Feb 2004 16:56:39 -0600 (CST) |
On Wed, 18 Feb 2004, Eric Sorenson wrote:
> Cfengine has the same problem, except when the host key changes
> you have to track down why this one machine can't get updates and
> the users are complaining.
This is another problem that I consider unsolved. How do you know all of
your hosts are correctly updating themselves? How do you even define
'correctly'?
At my previous client I was reading all syslog messages from a pipe
written to by syslog-ng, and then storing those logs in a database. I
tacked a small filter on that reader and had it start storing last-seen
records in LDAP for every host (with some throttling so I didn't spam the
LDAP server). Then I defined 'recent' for my various services (cfengine
and ISconf, in that case), and had a script which could easily check
whether all of my hosts were 'recent'. I never went so far as to connect
it to a tool like Nagios, but I would have liked to.
This was a pretty good method in that it used my master host list to tell
me the status of every host in the list. However, it had a serious
failing: It didn't have a good definition of correct. Of course, it was
also subject to failures of the syslog system (syslog-ng dies, the reading
script dies, etc.), but that was solvable through other methods.
So, as to 'correctly updating': If a client can successfully copy
_anything_ is it working? What about if it's just running cfagent at all?
What if it has some errors, such as being incapable of starting a process?
I don't believe it's possible to have cfagent collect the number of
errors, or to classify a portion of an update as 'critical' or 'optional',
but that would certainly be useful. If I could collect that information
and then use it to have the client update my LDAP repository as the last
stage in any run, then I would believe I had a good definition of a
functional system. Just a simple (No errors/Some Minor Errors/Some
Critical Errors/Total nonfunction/No Data) stat of some kind would be very
useful.
I'm working on it....
Luke
--
Health is merely the slowest possible rate at which one can die.
Re: Bootstrapping, Mark . Burgess, 2004/02/18
Re: Bootstrapping,
Luke A. Kanies <=
Re: Bootstrapping, John Sechrest, 2004/02/18
Re: Bootstrapping, Chip Seraphine, 2004/02/19
Re: Bootstrapping, Luke A. Kanies, 2004/02/19
Re: Bootstrapping, John Sechrest, 2004/02/19
Re: Bootstrapping, Luke A. Kanies, 2004/02/19
Re: Bootstrapping, John Sechrest, 2004/02/19
Re: Bootstrapping <= LDAP and authority, Chip Seraphine, 2004/02/19
Re: Bootstrapping, Mark . Burgess, 2004/02/19
Re: Bootstrapping, Luke A. Kanies, 2004/02/19
Re: Bootstrapping, Nate Campi, 2004/02/19