[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Idea for reducing disk IO on tagging operations

From: Paul Sander
Subject: Re: Idea for reducing disk IO on tagging operations
Date: Sun, 20 Mar 2005 12:20:27 -0800

Everything that Mark says is true. I'll add that some shops optimize their read operations under certain conditions, and such optimizations would break if the RCS files are updated in-place.

What happens is that, if the version of every file can be identified in advance (using version number, tag, or branch/timestamp pair) then they can invoke RCS directly to fetch file versions, read metadata, and so on. This sidesteps CVS' overhead and can increase performance by as much as 50%. Such operations will also succeed and not interfere with write operations to the repository, such as commits and the creation of new tags. Moving tags or using "cvs admin" may sometimes cause race conditions that produce incorrect results, but that all depends on the nature of the changes being made at the time and how the readable versions have been identified.

The reason that such an optimization works is because RCS rewrites the RCS file updates into the lock file, filesystem semantics always keep the complete RCS file intact while it's being read, and pre-existing data in the RCS file are not changed during write operations (except for those race conditions I've identified above, which can be avoided).

On Mar 20, 2005, at 8:28 AM, address@hidden wrote:

Hash: SHA1

Dr. David Alan Gilbert <address@hidden> writes:

So - here are my questions/ideas - I'd appreciate comments to tell
me whether I'm on the right lines:
  1) As I understand it the tag data is the
  first of the 3 main data structures in the RCS
  file (tag, comments, diffs) and that when I do
  pretty much any CVS operation I rewrite the
  whole file - is this correct?

CVS write operations on a foo.c,v repository file
will write ,foo.c, and then when the write
operation is successful and without any errors, it
does a rename (",foo.c,", "foo.c,v"); to make the
new version the official version. While the
,foo.c, file exists, RCS commands will consider
the file locked.

It is desirable to use RCS write semanitcs as many
other tools out there (cf, ViewCVS) use RCS on the
repository and want to obey RCS locking.

  2) White space appears to be irrelevent in RCS
  files; so adding arbitrary amounts in between
  sections should leave files still fully
  compatible with existing RCS/cvs tools.

Tools such as CVSup by default will canonicalize
the whitespace between sections (although this may
be configured). So, yes, whitespace is mostly
irelevent between sections.

  3) So the idea is that when I add a tag I add
  a bunch of white space after the tag (lets say
  1KB of spaces split into 64 byte lines or
  similar); when I come to add the next tag I
  check if there is plenty of white space, if
  there is then instead of rewriting the file I
  just overwrite the white space with my new tag
  data; if there is no space then as I rewrite
  the file I add another lump of white space.

This has the potential to more easily corrupt the
RCS file if the operation is interrupted for any

  4) Whether dummy white space is added and how
  much is controlled by the existing size of the
  RCS file; so an RCS file that is only a few KB
  wont have any space added; that way this
  mechanism doesn't slow down/bloat small
  repositories. The amount of white space might
  be chosen to align data structures with disk
  block boundaries.

  5) My main concern is to do with
  concurrency/consistency requirements; is the
  file rewrite essential to ensure consistency,
  or is the locking that is carried out

Does this make sense?

It would be more robust to enhance CVS to use an
external database for tagging information instead
of putting the tagging information into the RCS
files directly than to rewrite parts of the RCS
file and hope that the operation didn't corrupt
the file along the way.

You may wish to consider looking at Meta-CVS as I
believe that Kaz keeps a lot of the branching
information outside of the RCS files already.

for more details on Meta-CVS.

        Good luck,
        -- Mark
Version: GnuPG v1.2.3 (FreeBSD)


Info-cvs mailing list

Paul Sander       | "When a true genius appears in the world, you may
address@hidden | know him by this sign:  that all the dunces are in
| confederacy against him." -- Jonathan Swift, writer.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]