[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[patch #4573] Fix for keyword expansion problem/mis-feature during commi

From: Rahul Bhargava
Subject: [patch #4573] Fix for keyword expansion problem/mis-feature during commit
Date: Fri, 28 Oct 2005 14:28:54 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4


                 Summary: Fix for keyword expansion problem/mis-feature
during commit
                 Project: Concurrent Versions System
            Submitted by: b_rahul
            Submitted on: Fri 10/28/05 at 14:28
                Category: None
                Priority: 9 - Immediate
                  Status: None
                 Privacy: Public
             Assigned to: None
        Originator Email: 
             Open/Closed: Open
           Fixed Release: None
   Fixed Feature Release: None



The cvs server will automatically (or, more accurately, as part of the
update run that automatically happens after a commit) expand valid keyword
strings in a text file as part of the commit operation. In other words it is
the responsibility of the 'server-side' of cvs to fill in the appropriate
values for the RCS keywords.

Currently where this breaks down is when the end-user modifies the content of
the RCS keywords. For example, let's say a file contains:

File : foo
Line 1: $Id$

On the initial commit, the server returns the file (if keyword expansion is
not turned off) and expands Id as -

File: foo
$Id: id,v 1.1 2005/10/28 18:39:46 userX Exp $

If the user accidentally or intentionally changes just the expanded value (no
other change) on the client-side to:

File: foo (revision 1.1 in user sandbox)
$Id: id,v 1.9 3005/10/28 18:39:46 userYYY Exp $

and then does a commit, the CVS server as it stands today will be happy to
commit a new version but as part of the update run that automatically happens
after a commit, will re-expand the keyword and the CVS client will have the
following file in the sandbox :

File: foo 
$Id: id,v 1.2 2005/10/28 18:46:18 userX Exp $

In other words the user's intent of modifying the keyword content is not
honored. This make sense as keyword expansion is owned by the server and a
client can not wily-nilly change the keyword content (between the two $, with
the exception of $Log$ keyword of-course)

However, this also causes spurious revisions to be created in the CVS
repository and this revision is not what the end user expects. The revision
metadata within the $..$ keyword expansion is controlled by the server and as
seen in the above example, the user can't change it even though his attempt to
do so results in a new revision being committed.

We have come across large CVS repositories where lots of such spurious
revisions (no content change except keyword) exist on several files. This
causes problems downstream when for example doing branch merges, as
differences on just keyword expansion can cause the file to be treated as
changed.  Creating new revisions is an important step. It has consequence on
downstream tools. Per-user stats (how may commits by a given user, for
example) are impacted when revisions are created. One way to deal with this
problem is to turn off keyword expansion (as available in 1-12-XX feature
tree or use -ko for every file that has keywords), but that is a sledgehammer
approach that denies the benefits of RCS keywords to the end users.

The patch (off the 1-11-21 tree) that we have attached addresses this
bug/mis-feature. It ensures keyword expansion will not  cause  spurious
revisions that differ just in keyword content that the CVS server inserts
between $..$ markers. With the patch, a CVS user can still use keyword
expansion as expected, but
during the file classification phase of a CVS commit, the patch kicks in and
triggers a keyword-sensitive
diff algorithm, that we have developed, which will ignore RCS/CVS keyword
differences during this phase.
This reduces spurious revision creation. The patch deals with all keywords
except the logmsgs after a $Log$ keyword. 

The patch uses a space-time efficient diff'ing algorithm that does not
require the entire file
to be read into memory to do the keyword canonicalization. It operates on a
block at
a time. The algorithm does not need to make multiple passes. As described in
src comments, the algorithm is implemented as a finite state machine that
tracks the
state across block comparisons and can deal with keywords straddled across
boundaries etc. Right now, the patch is turned on by default. This behavior
can be
controlled via a new option that we have added to CVSROOT/config:
If set to "no" it will switch back to the old semantics, just in case someone
needs that.
The patch also builds a new executable "kwdiff" that can be used for
comparing checked-out files 
such that keyword differences can be ignored. This is useful when comparing
between two users each with their own sand-box. It is also used for unit
testing the implementation(see below).

As part of the patch, we have added new tests to sanity.sh. We have also
fixed testcases
that were expecting the old behavior (spurious revisions). To test the
algorithm for every
possible corner case, we have also devised a unit test that generates
millions of test file
permutations. These are perl based and we didn't integrate them into
sanity.sh as it would
change the standalone nature of the script. Memory tools like valgrind have
been run on the patch to ensure there are no memory errors.

In the future, it may make more sense to create a unit test
subdir for testing components of cvs (as opposed to the full cvs executable),
in which case the kwdiff unit tests
could be checked in there. 

This patch enables the keyword sensitive diff algorithm only during commits,
the CVS community may want to consider it for updates, especially to reduce
keyword related conflicts during branch merges. Given how the code is
structured it is
relatively easy to turn on and off the keyword canonicalization algorithm
other cases beyond commits. Today CVS reads the entire RCS file into memory
for many
operation. This increases the footprint of the CVS process and can be a
scalability bottleneck.
The file chunk management code in kwfilecmp.c shows that it is possible to
operate on a 
block basis, thus bounding the footprint of the cvs server process,
regardless of the size of the file being processed.
Perhaps, in the future, this module could be re-used for eliminating the need
to read in the
entire file into memory for many CVS operations.

The patch has been developed off the 1.11.21 branch.  It can easily be
 for the feature branch to take into account local keyword aliases. This will
require changes to a single function ( is_valid_keyword).

Attached bzip2 file contains the kwdiff.patch (actual patch) and the
kwdiff-unit-test/ subdir.

We hope the CVS community will find the patch useful.


Carbon-Copy List:

CC Address                          | Comment
aahlad@wandisco.com                 | 


File Attachments:

Date: Fri 10/28/05 at 14:28  Name: kwdiff.tbz2  Size: 13.6KB   By: b_rahul
Tar with patch file etc


Reply to this item at:


  Message sent via/by Savannah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]