[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: History file clobbered by multiple simultaneous exits

From: Geoff Kuenning
Subject: Re: History file clobbered by multiple simultaneous exits
Date: Thu, 25 Jul 2013 00:30:46 -0700
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux)

>       As for the problem...the fact that you using 4.2, would seem
> to make the algorith
> open(<zero as new file>)
> write(whatever we have as history)
> close(set eof to where we are).
> What file system are you are?  is it local or networked?

Local, ext3.

> one way for it to be zero is if the last bash exiting had no history,
> cuz the zero/truncate of each open can zero the file from any previous
> bash being left.

I thought of that too, but it's not the case for me.  Even after the
failure has wiped the old history, my new shells have at least 1-2
commands kicking around.  So I could imagine my nice 500-line history
turning into a 2-line one, but not zero-length.

> I can also see the possibility of some kernel or file system routine
> waiting after you issue the close call so that it doesn't have to zero
> the area where data is arriving.  I.e. it might only zero the file beyond
> the valid text AFTER some delay (5 seconds?) OR might wait until the file
> is closed, so if you completely overwrite the old file with text, the
> kernel won't have to zero anything out.

If so, that would be a big bug.  When you're truncating a file to a
shorter length, some filesystems do indeed delay freeing the blocks in
hopes of reusing them.  But the length is set to zero when the O_TRUNC
happens, and likewise if you write n bytes, the length is immediately
increased by n.  There are certain races on some filesystems that could
cause the n bytes to be incorrect (e.g., garbage), but that generally
happens only on system crashes.  There's a paper on this from a few
years back; I'd have to review it to be sure but my recollection is that
you can't get zero-length files in the absence of system or hardware
failures.  (However, I'm not sure whether they used SMPs...)

Still, I suppose it could be a kernel bug.  Maybe I'll have to write a
better test program and let it run overnight.

> in the case of write...close to non-pre-zeroed record, the operation
> becomes a read-modify-write.  Thing is, if proc 3 goes out for the
> partial buffer
> (~4k is likely), it may have been completely zeroed from proc2 closing
> where proc3
> wants to write.

No generic Linux filesystem that I'm aware of zeroes discarded data at
any time; it's too expensive.  And the partial buffer would be in-memory
at that point, since the processes can exit much faster than the buffer
could be written to disk.  So I don't think that's it.

> (multi-threaded ops on real multiple execution units do the darnest things).

Ain't that the truth!
    Geoff Kuenning   address@hidden   http://www.cs.hmc.edu/~geoff/

An Internet that is not Open represents a potentially grave risk to
freedoms of many sorts -- freedom of speech and other civil liberties,
freedom of commerce, and more -- and that openness is what we must so
diligently work to both preserve and expand.
                -- Lauren Weinstein

reply via email to

[Prev in Thread] Current Thread [Next in Thread]