[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Idea for reducing disk IO on tagging operations

From: Mark D. Baushke
Subject: Re: Idea for reducing disk IO on tagging operations
Date: Sun, 20 Mar 2005 14:08:59 -0800

Hash: SHA1

Dr. David Alan Gilbert <address@hidden> writes:

> * Mark D. Baushke (address@hidden) wrote:
> Hi Mark,
>   Thanks for your reply.
> > Dr. David Alan Gilbert <address@hidden> writes:
> > 
> > > So - here are my questions/ideas - I'd appreciate comments to tell
> > > me whether I'm on the right lines:
> > >   1) As I understand it the tag data is the
> > >   first of the 3 main data structures in the RCS
> > >   file (tag, comments, diffs) and that when I do
> > >   pretty much any CVS operation I rewrite the
> > >   whole file - is this correct?
> > 
> > CVS write operations on a foo.c,v repository file
> > will write ,foo.c, and then when the write
> > operation is successful and without any errors, it
> > does a rename (",foo.c,", "foo.c,v"); to make the
> > new version the official version. While the
> > ,foo.c, file exists, RCS commands will consider
> > the file locked.
> > 
> > It is desirable to use RCS write semanitcs as many
> > other tools out there (cf, ViewCVS) use RCS on the
> > repository and want to obey RCS locking.
> OK, if I create a dummy ",foo.c," before
> modifying (or create a hardlink with that name
> to foo.c,v ?) would that be sufficient?

I would say that it is likely necessary, but may
not be sufficient.

> Or perhaps create the ,foo,c, as I normally
> would - but if I can use this overwrite trick on
> the original then I just delete the ,foo.c,
> file.

I am unclear how this lets you perform a speedup.

> Is the problem that things are allowed to read
> the original foo.c,v while you are creating the
> new version?

I am given to understand that many of the
anicillary tools that surround CVS make use of
being able to read a consistent ,v file at all

> > >   3) So the idea is that when I add a tag I add
> > >   a bunch of white space after the tag (lets say
> > >   1KB of spaces split into 64 byte lines or
> > >   similar); when I come to add the next tag I
> > >   check if there is plenty of white space, if
> > >   there is then instead of rewriting the file I
> > >   just overwrite the white space with my new tag
> > >   data; if there is no space then as I rewrite
> > >   the file I add another lump of white space.
> > 
> > This has the potential to more easily corrupt the
> > RCS file if the operation is interrupted for any
> > reason.
> The act of rewriting adding extra space would be
> performed using the existing mechanism (with
> just some extra add space created in
> RCS_rewrite); so that can't be a problem.

Adding extra data to the ,foo.c, file during the
normal write operation should not be a problem.

> So the issue is what happens if the interrupt
> occurs as I'm overwriting the white space to add
> a tag; hmm yes; 

Correct. Depending on the filesystem kind and the
level of I/O, your rewrite could impact up to three
fileblocks and the directory data.

> is it possible to guard against this by using a
> single call to write(2) for that? 

Not for all possible filesystem types.

> Is that the problem you are thinking of?

Yes. Even worse things can happen in this regard
if the filesystem is a 'stateless' one such as an
NFS mounted directory (we keep advising folks
against using them, but I know for a fact that
they are still used).

> > It would be more robust to enhance CVS to use an
> > external database for tagging information instead
> > of putting the tagging information into the RCS
> > files directly than to rewrite parts of the RCS
> > file and hope that the operation didn't corrupt
> > the file along the way.
> Sure, seperating the tagging data out is much
> neater; but what I was looking for here was a
> simple speed up which didn't require anything
> extra and would be fully compatible with
> existing tools.

And you are finding that existing tools torture
the assumptions you are able to make about the CVS

FWIW: (In my personal experience) using a SAN
solution for your repository storage allows you
much better throughput for all write operations in
the general case as the SAN can guarentee the
writes are okay before the disk actually does it.

Optimizing for tagging does not seem very useful
to me as we typically do not drop that many tags
on our repository.

> > You may wish to consider looking at Meta-CVS
> > as I believe that Kaz keeps a lot of the
> > branching information outside of the RCS files
> > already.
> > 
> > See
> > for more details on Meta-CVS.
> If I was changing to another tool then I'd have
> a much larger set of tools to consider (e.g.
> subversion) but I'd rather stick with plain CVS
> if I can - I've got clients on lots of (weird)
> OSs that work via pserver and an infinite number
> of scripts built around CVS.

Indeed. Part of the difficulty with CVS
development has been worrying about legacy
software assumptions.

        -- Mark
Version: GnuPG v1.2.3 (FreeBSD)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]