[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Idea for reducing disk IO on tagging operations

From: Paul Sander
Subject: Re: Idea for reducing disk IO on tagging operations
Date: Sun, 20 Mar 2005 17:00:54 -0800

On Mar 20, 2005, at 3:54 PM, address@hidden wrote:

* Mark D. Baushke (address@hidden) wrote:
Hash: SHA1

Dr. David Alan Gilbert <address@hidden> writes:

OK, if I create a dummy ",foo.c," before
modifying (or create a hardlink with that name
to foo.c,v ?) would that be sufficient?

I would say that it is likely necessary, but may
not be sufficient.

Hmm ok.

Or perhaps create the ,foo,c, as I normally
would - but if I can use this overwrite trick on
the original then I just delete the ,foo.c,

I am unclear how this lets you perform a speedup.

I only create the ,foo.c, file - I don't write anything into it; the
existence of the file is enough to act as the RCS lock; if I can do my
inplace modification then I delete this file after doing it, if not then
I proceed as normal and just write the ,foo.c, file and do the rename
as you normally would.

You're forgetting something: The RCS commands will complete read-only operations on RCS files even in the presence of the comma files owned by other processes. Your update protocol introduces race conditions in which the RCS file is not self-consistent at all times.

There's also the interrupt issue: Killing an update before it completes leaves the RCS file corrupt. You'd have to build in some kind of crash recovery. But RCS already has that by way of the comma file, which can simply be deleted. Other crash recovery algorithms usually involve transaction logs that can be reversed and replayed, or the creation of backup copies. None of these are more efficient than the existing RCS update protocol.

So the issue is what happens if the interrupt
occurs as I'm overwriting the white space to add
a tag; hmm yes;

Correct. Depending on the filesystem kind and the
level of I/O, your rewrite could impact up to three
fileblocks and the directory data.

is it possible to guard against this by using a
single call to write(2) for that?

Not for all possible filesystem types.

You'd have to guarantee that the write is atomic and flushes results completely to disk, even in the presence of things like power failures. It's hard to make this guarantee given all the buffering that goes on below the write(2) API.

Optimizing for tagging does not seem very useful
to me as we typically do not drop that many tags
on our repository.

In the company I work for we are very tag heavy, but more importantly
it is the tagging that gets in peoples way and places the strain
on the write bandwidth of the discs/RAID.

I once built a successful system that tracked desirable configurations by building lists of file/version pairs, then committing and tagging the lists. The lists were built by polling the Entries files in workspaces (and making sure there were no uncommitted changes). This was fast and efficient, and it opens you up to use the optimization I mentioned earlier. And if you rely on floating tags, such lists could track the history of the tags as well.

In addition, an algebra can be easily written to manipulate such lists. Combine this with a way to link these lists with your defect tracking system, and you have the tools to build a very good change control system.

Paul Sander | "Lets stick to the new mistakes and get rid of the old
address@hidden | ones" -- William Brown

reply via email to

[Prev in Thread] Current Thread [Next in Thread]