[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Idea for reducing disk IO on tagging operations

From: Mark D. Baushke
Subject: Re: Idea for reducing disk IO on tagging operations
Date: Sun, 20 Mar 2005 08:28:39 -0800

Hash: SHA1

Dr. David Alan Gilbert <address@hidden> writes:

> So - here are my questions/ideas - I'd appreciate comments to tell
> me whether I'm on the right lines:
>   1) As I understand it the tag data is the
>   first of the 3 main data structures in the RCS
>   file (tag, comments, diffs) and that when I do
>   pretty much any CVS operation I rewrite the
>   whole file - is this correct?

CVS write operations on a foo.c,v repository file
will write ,foo.c, and then when the write
operation is successful and without any errors, it
does a rename (",foo.c,", "foo.c,v"); to make the
new version the official version. While the
,foo.c, file exists, RCS commands will consider
the file locked.

It is desirable to use RCS write semanitcs as many
other tools out there (cf, ViewCVS) use RCS on the
repository and want to obey RCS locking.

>   2) White space appears to be irrelevent in RCS
>   files; so adding arbitrary amounts in between
>   sections should leave files still fully
>   compatible with existing RCS/cvs tools.

Tools such as CVSup by default will canonicalize
the whitespace between sections (although this may
be configured). So, yes, whitespace is mostly
irelevent between sections.

>   3) So the idea is that when I add a tag I add
>   a bunch of white space after the tag (lets say
>   1KB of spaces split into 64 byte lines or
>   similar); when I come to add the next tag I
>   check if there is plenty of white space, if
>   there is then instead of rewriting the file I
>   just overwrite the white space with my new tag
>   data; if there is no space then as I rewrite
>   the file I add another lump of white space.

This has the potential to more easily corrupt the
RCS file if the operation is interrupted for any

>   4) Whether dummy white space is added and how
>   much is controlled by the existing size of the
>   RCS file; so an RCS file that is only a few KB
>   wont have any space added; that way this
>   mechanism doesn't slow down/bloat small
>   repositories. The amount of white space might
>   be chosen to align data structures with disk
>   block boundaries.
>   5) My main concern is to do with
>   concurrency/consistency requirements; is the
>   file rewrite essential to ensure consistency,
>   or is the locking that is carried out
>   sufficient?
> Does this make sense?

It would be more robust to enhance CVS to use an
external database for tagging information instead
of putting the tagging information into the RCS
files directly than to rewrite parts of the RCS
file and hope that the operation didn't corrupt
the file along the way.

You may wish to consider looking at Meta-CVS as I
believe that Kaz keeps a lot of the branching
information outside of the RCS files already.

for more details on Meta-CVS.

        Good luck,
        -- Mark
Version: GnuPG v1.2.3 (FreeBSD)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]