info-cvs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: File Corruption Problem on Unix


From: Eric Siegerman
Subject: Re: File Corruption Problem on Unix
Date: Wed, 14 Aug 2002 17:36:03 -0400
User-agent: Mutt/1.2.5i

On Wed, Aug 14, 2002 at 04:28:13PM -0400, Brian Robinson wrote:
> >Why is the machine rebooting?  Is someone doing it on purpose, or
> >is it crashing?  If the latter, well, it shouldn't be; fixing
> >that will solve the supposed "CVS" problem too.
> 
> Agreed.  In this case, we had a power failure in the area that crashed all 
> of our servers without battery backup (this was one of them).
> So it indeed did not come down gracefully.

Unsurprising so far.

> However, our other unix servers 
> that came down all came up without a hitch yesterday.
> We investigated this path a little more a few weeks ago -- running some CVS 
> activity, then rebooting with "init 0", then "boot".  It resulted in the 
> same kinds of problems

This *is* surprising.  I'd have expected "init 0" to sync before
going to the firmware.  Hmmm, /etc/rc0 says, in part:
        sync; sync; sync
        umountall
        umount /var/adm
        umount ...      # more standard FS's that umountall doesn't touch

Looks like there's a race there; I think the sync's should be
*after* the umount's.  Still, I'd have expected init, after
/etc/rc0 returns, to tell the kernel to go down *with* syncing,
but maybe it doesn't.

>   Based on your comments, it might be more prudent to bring down the CVS 
> server gracefully before doing a planned reboot.

Indeed; though again, I'd expect rc0 to have done that:
/etc/rc0.d/K??inetsvc kills inetd, which should prevent new
server processes from starting up; then killall does in any
existing ones (in this case, they get the order right, so there's
no race condition).

> There was nothing active at the time.  The files impacted were created over 
> the past month.

Now this is really weird.  Buffers are supposed to get flushed
pretty frequently.  This suggests that that somehow isn't
happening.  Unless of course the files were created in a
directory that itself got trashed.

> We were able to trace them to specific CVS activities 
> (scripts) that ran then (based on timestamps on script output and the 
> lost+found files created).  (yes, yes, I know CVS may just be the 
> messenger... =)

... and a particularly informative one at that :-)

Recommendations:
 1. find out why your filesystems aren't syncing during a planned
    reboot

 2. see whether the normal mechanism that periodically syncs
    buffers isn't functioning

 3. take further questions to a Solaris list/newsgroup; you'll
    get better answers there, including how to do (1) and (2) :-)

Good luck!

--

|  | /\
|-_|/  >   Eric Siegerman, Toronto, Ont.        address@hidden
|  |  /
Anyone who swims with the current will reach the big music steamship;
whoever swims against the current will perhaps reach the source.
        - Paul Schneider-Esleben




reply via email to

[Prev in Thread] Current Thread [Next in Thread]