info-cvs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Idea for reducing disk IO on tagging operations


From: Mark D. Baushke
Subject: Re: Idea for reducing disk IO on tagging operations
Date: Sun, 20 Mar 2005 23:35:08 -0800

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Paul Sander <address@hidden> writes:

> > I only create the ,foo.c, file - I don't write anything into it; the
> > existence of the file is enough to act as the RCS lock; if I can do my
> > inplace modification then I delete this file after doing it, if not
> > then
> > I proceed as normal and just write the ,foo.c, file and do the rename
> > as you normally would.
> 
> You're forgetting something:  The RCS commands will complete read-only
> operations on RCS files even in the presence of the comma files owned
> by other processes.  Your update protocol introduces race conditions
> in which the RCS file is not self-consistent at all times.

Actually, if you look closely, I believe that CVS will not do read-only
RCS operations if a CVS write-lock exists for the directory. Of course,
ViewCVS and CVSweb do it all the time as do many of the other add-ons.

> There's also the interrupt issue:  Killing an update before it
> completes leaves the RCS file corrupt.  You'd have to build in some
> kind of crash recovery.  But RCS already has that by way of the comma
> file, which can simply be deleted.  Other crash recovery algorithms
> usually involve transaction logs that can be reversed and replayed, or
> the creation of backup copies.  None of these are more efficient than
> the existing RCS update protocol.

Agreed. This is a very big deal.

Dr. David Alan Gilbert <address@hidden> writes:

> > FWIW: (In my personal experience) using a SAN
> > solution for your repository storage allows you
> > much better throughput for all write operations in
> > the general case as the SAN can guarentee the
> > writes are okay before the disk actually does it.
> 
> But when you throw a GB of writes at them in a short time from a tag
> accross our whole repository they aren't going to be happy - they are
> going to want to get rid of that backlog of write data ASAP.

I believe you will find that the performance knee for a commercial SAN
that is well provisioned happens when you hit a 2GB of sustained writes.
You are more likely to run into problems with bandwidth to the
fiberchannel mesh first.

For us, I seem to recall that the actual bottleneck is the creation of
the /tmp/cvs-server$$ trees for a 'cvs tag' operation. So, you results
will also depend on how shallow or deep your module hierarchy runs.

> > Optimizing for tagging does not seem very useful
> > to me as we typically do not drop that many tags
> > on our repository.
> 
> In the company I work for we are very tag heavy, but more importantly
> it is the tagging that gets in peoples way and places the strain on
> the write bandwidth of the discs/RAID.

Sure, a conventional RAID can be very expensive to rewrite all of the
files.

It is certainly possible that a close look at CVS performance
bottlenecks may find some places where improvements in throughput could
be gained. However, I and not at all certain that your particular
suggestion would be the best use of optimization time.

        Enjoy!
        -- Mark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD8DBQFCPnkr3x41pRYZE/gRAtu0AJ4qNbP4WSN9C60hZsaBejYwYcbnDACdGsOZ
RMw/SnkdG/mGOP2oyrdWnis=
=lD1h
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]