info-cvs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: importing vendor branches


From: Michael A. Fetterman
Subject: Re: importing vendor branches
Date: Thu, 12 Jul 2001 12:56:11 -0700

I wasn't happy with any of the vendor branch suggestions presented
on this mailing list, though Rickard Parker's summary (dated
04 May 2001) came close.

Here are my arguments against the current semantics of "cvs import":

"cvs import" assumes that you are trying to merge third party code into your
mainline at the time of the import.  It assumes that you can do this rather
fast (e.g. the mainline is corrupted until you are done with whatever merges
are necessary, and have committed them).  It assumes that you *want* to do
the merge, as opposed to simply recording the third party source on a branch
for convenient tracking/diff'ing/etc.

Fundamentally, "cvs import" as currently implemented is not compatible with
the idea (or attempts at the idea) of having commits being atomic.  Some
revision control systems (not CVS) have the idea of "change sets".
Translated to CVS-speak, a change set is roughly the equivalent of one
commit.  The basic idea of a change set is that you either have ALL the
changes in a given change set, or none of them.  You can never see "parts"
of a change set.  "cvs import" doesn't currently allow an easy emulation of
this behavior.

Specific evil behaviors of cvs import:

1) If "cvs import" is creating a new file in the repository, it puts a copy
   of the new file contents into revision 1.1 (i.e. on the mainline).  It
   really should have left the mainline alone until you merged this lastest
   import delta (i.e. "update -j lastreleasetag -j thisreleasetag") onto
   the mainline.  The mere existence of this new file on the mainline can
   create problems.  (Ideally, it should have labelled 1.1 with state
   "dead", and put the file into the Attic.)

2) If "cvs import" is creating a new file in the repository, it sets the
   default branch of that RCS file to be the vendor branch.  This is
   normally set back to the mainline when the first commit occurs to this
   file's mainline, but it means that until such a commit occurs,
   subsequent "cvs import"s will be visible in checkouts/updates of the
   mainline, which seems clearly wrong (as only *some* of the changes in
   these subsequent "cvs import"s are reflected into the mainline).

3) "cvs import" requires a numeric vendor branch of the form "X.Y.Z"; in
   general, it uses "1.1.1"...  If revision 1.1 doesn't exist (i.e. a
   pre-existing file, which doesn't have a revision 1.1 for whatever
   reason), it just fails.  While you can override the "1.1.1" default
   branch with the "import -b" option, you have to provide a numeric
   revision number; you can't use a symbolic branch name (wouldn't it make
   more sense if you provided the vendor tag?)!  Finding a single numeric
   revision number for an entire repository is not possible in general.

While #3 is annoying, I haven't bothered to fix it.  Yet.

One proposed fix (for #1 and #2 only):
For each file newly created by a cvs import, do a "cvs delete" on that file.
That will cause its default branch to be set back to the mainline, and will
effectively remove the unwanted file contents from the mainline.

I believe this should be implemented as an additional switch to "cvs
import" (or, IHMO, as its default behavior, with a switch for backwards
compatibility), but I chose to write a perl script to handle it instead, so
that I can continue using various cvs versions/binaries on machines that I
don't administer.  I've included the script below, for anyone who might be
interested.  Since the script's actions are not atomic with the actual
import, there is a small window where "cvs import" has still managed to
corrupt the repository's view of the mainline, but this script repairs it
quickly enough for my purposes.

Assuming I just did:
   cvs -d <cvsroot> import <flags> <directory> <vendortag> <releasetag>

Then if I ignore the message cvs spits out about how to do a merge, and
instead I immediately run this script, the script does the following:
  a) creates a temp area in /tmp
  b) does a "cvs -d <cvsroot> co <directory>" in this temp area
  c) runs "cvs log", and parses the output to find the files that were
     created by this particular "cvs import"
  d) runs "cvs delete -f" on them
  e) runs "cvs commit"
  f) remove its temp area

Once this is completed, I can choose to merge the newly imported stuff
onto the mainline whenever I want to...  Or perhaps never merge it...
The mainline no longer shows *any* impact from the "cvs import".

To do the merge, I use either "cvs co -P" or "cvs update -d -P" with two
"-j" arguments, as follows:

- If this is my very first merge from the <vendortag> branch, then the
  first argument is "-j 1.0".  Otheruse, I use "-j <releasetag>", where the
  releasetag is the tag from your last merge from this vendor branch.  This
  assumes that revision 1.0 doesn't exist for any files in your repository
  (which is normally the case for a cvs repository).

- The second argument should be "-j <releasetag>".

Handle the merge like any other merge, and commit it when you are good and
ready.  And not before.

The script takes care of prompting the user with the right command line to
use for the merge.  It also spits out these basic instructions with a "-h"
option, or a long involved diatribe (similar to this email) with the
"--diatribe" option.  :)

The script is particularly designed to work correctly in the presence of
multiple cvs vendor branches (ala "cvs import -b"), but does nothing to fix
the brain dead requirements of that switch for a numeric branch number.

Michael Fetterman



#!/usr/bin/perl -w

# Placeholder for stuff to be overridden via command line (if/when real
# command line parsing is added to this script).
#
$cvs_binary = "cvs";

sub usage {
    my($prog) = $0;
    $prog =~ s{.*/}{};

    print "$prog: @_\n" if @_;
    print(<<EOM);
Usage:

    $prog -h
                     Produces this usage message.
    $prog --diatribe
                     Emits a diatribe about why "cvs import" is broken.

Assuming you've just executed a "cvs import" of roughly the following syntax:

    cvs -d <cvsroot> import [<flags>] <repository> \\
        <vendortag> <tag1> [<tag2>...]
or...
    env CVSROOT=<cvsroot> cvs import [<flags>] <repository> \\
        <vendortag> <tag1> [<tag2>...]

Then the correct usage for this script is:

    $prog <cvsroot> <repository> <vendortag> <tag1> [<tag2>...]

EOM
    exit(@_ ? 1 : 0);
}

# Parse the arguments.  Don't strain yourself...
#
&diatribe() if (@ARGV && $ARGV[0] eq "--diatribe");
&usage() if (@ARGV && $ARGV[0] =~ m/^-/);
&usage("too few arguments") if @ARGV < 4;

($opt_cvsroot, $opt_repository, $opt_vendor, @opt_tags) = @ARGV;

# Create a temporary working area
#
$tmpdir = $0;
$tmpdir =~ s{.*/}{}; # remove any leading pathnames
$tmpdir = "/tmp/$tmpdir.$$";
&System("/bin/rm -rf $tmpdir");
&Mkdir($tmpdir);

# Checkout a copy of the repository.
#
# This checkout needs to be "on the mainline" so that the subsequent "cvs 
delete"
# will emulate the removal of the offending files from the mainline.
#
&Chdir($tmpdir);
&System("$cvs_binary -f -d $opt_cvsroot co -d mainline $opt_repository");

# Grab a copy of the cvs log.
#
# Too bad we can't assume cvs 1.11 (where "cvs rlog" does the right thing).
# For now, we'll have to continue to be cvs 1.10 (and earlier) compatible.
#
&Chdir("mainline");
&System("$cvs_binary -f -d $opt_cvsroot log > ../log.stdout 2> ../log.stderr");

# Parse the cvs log
#
($status) = &parse_cvs_log("< $tmpdir/log.stdout");

# Find our "problem children".
#
@problem_children = ();
fileloop:
    foreach $file (sort keys %{$status->{total_revisions}}) {

        # skip it if it has more than two revisions checked in
        #
        next unless $status->{total_revisions}{$file} == 2;

        # skip it if it doesn't have either of the vendor tags or any of the 
release tags
        #
        next unless exists $status->{tags}{$file}{$opt_vendor};
        foreach $tag (@opt_tags) {
            next fileloop unless exists $status->{tags}{$file}{$tag};
        }

        # skip it if the vendor tag isn't of the form: A.B.C
        #
        my $mainline = $status->{tags}{$file}{$opt_vendor};
        next unless $mainline =~ s/^(\d+\.\d+)\.\d+$/$1/;

        # skip it if the first release tag isn't a branch off the vendor branch
        #
        my $branch = $status->{tags}{$file}{$opt_tags[0]};
        next unless $branch =~ m/^$mainline\.\d+\.\d+$/;

        # skip it if either mainline revision or branch revision doesn't exist
        #
        next unless exists $status->{date}{$file}{$mainline};
        next unless exists $status->{date}{$file}{$branch};

        # skip it if the (dates,authors) don't match
        next unless $status->{date}{$file}{$mainline} eq 
$status->{date}{$file}{$branch};
        next unless $status->{author}{$file}{$mainline} eq 
$status->{author}{$file}{$branch};

        # skip it if either revision's "state" is not "Exp"
        #
        next unless $status->{state}{$file}{$mainline} eq "Exp";
        next unless $status->{state}{$file}{$branch} eq "Exp";

        # skip it if there's a difference between the mainline and the branch
        #
        next unless $status->{lines}{$file}{$branch} eq "+0 -0";

        # OK, I'm convinced.  This is one of those cases we're looking for...
        #
        push(@problem_children, $file);
    }

if (@problem_children) {

    open(XARGS, "| xargs -t $cvs_binary -f -d $opt_cvsroot delete -f") || die 
"$0: popen(xargs) failed: $!\n";
    print XARGS join("\n", @problem_children), "\n";
    close(XARGS);
    if ($?) {
        die "$0: xargs failed, exit status $?\n";
    }

    # Commit those "cvs delete"s to the repository.
    #
    &System("$cvs_binary -f -d $opt_cvsroot commit -m \"cvs-import-fixup 
removal of new file from mainline\"");
}

# Remove the working area.
#
&System("/bin/rm -rf $tmpdir");

print(<<EOM);

You are now ready to do a merge.
Use either "cvs co -P" or "cvs update -d -P" with two "-j" arguments,
as follows:

- If this is your first merge from the "$opt_vendor" vendor branch, then the
  first argument is "-j 1.0".  Otheruse, use "-j <releasetag>", where the
  releasetag is the tag from your last merge from this vendor branch.

- The second argument should be "-j $opt_tags[0]".

EOM

exit 0;

use strict;

# parse_cvs_log
#
# Reads the output of "cvs log" from a file...
#
# Creates/return the following data structures:
#   $result->{total_revisions}{<working_file>} = <revisioncount>
#   $result->{tags}{<working_file>}{<tagname>} = <tagvalue>
#   $result->{revisions}{<working_file>} = [ <list-of-revisions> ]
#   $result->{date}{<working_file>}{<revision>} = <date>
#   $result->{author}{<working_file>}{<revision>} = <author>
#   $result->{state}{<working_file>}{<revision>} = <state>
#   $result->{lines}{<working_file>}{<revision>} = <lines_in_delta_text>
#
sub parse_cvs_log {
    my($file) = @_;

    # Return values
    #
    my %status;

    local(*LOG);
    my $watch_next_line = "nothing";
    my($working_file, $total_revisions, $revision, $date, $author, $state, 
$lines);

    open(LOG, "$file") || die "$0: open($file): $!\n";
    while (<LOG>) {
        chomp;

        if (m?^Working file: (.*)?) {
            $working_file = $1;
            next;
        };

        if (m?^total revisions: (\d+);?) {
            $total_revisions = $1;
            if (defined($working_file) && ! exists 
$status{total_revisions}{$working_file}) {
                $status{total_revisions}{$working_file} = $total_revisions;
            } else {
                if (! defined($working_file)) {
                    warn "$0: parse_cvs_log: $file($.): no obvious working 
file?!?";
                }
                if (exists $status{total_revisions}{$working_file}) {
                    warn "$0: parse_cvs_log: $file($.): working file 
$working_file appears in log multiple times?!?";
                }
            }
            next;
        };

        if (m?^symbolic names:$?) {
            if (defined($working_file)) {
                while (<LOG>) {
                    chomp;
                    if (m/^\s+(\S+): (\d+(\.\d+)+)$/) {
                        $status{tags}{$working_file}{$1} = $2;
                        next;
                    }
                    last;
                }
                redo;
            } else {
                warn "$0: parse_cvs_log: $file($.): no obvious working file?!?";
            }
        };

        if (m/^-{28}$/) {
            $watch_next_line = "revision";
            next;
        } elsif ($watch_next_line eq "revision" && m/^revision (\d+(\.\d+)+)$/) 
{
            $revision = $1;
            $watch_next_line = "date";
            next;
        } elsif ($watch_next_line eq "date" && m=^date: (\d\d\d\d/\d\d/\d\d 
\d\d:\d\d:\d\d);\s+author: (\S+);\s+state: (\S+);(\s+lines: (.*))?=) {
            $date = $1;
            $author = $2;
            $state = $3;
            $lines = $5;

            $watch_next_line = "nothing";

            if (defined($working_file) && defined($revision)) {
                push(@{$status{revisions}{$working_file}}, $revision);
                $status{date}{$working_file}{$revision} = $date;
                $status{author}{$working_file}{$revision} = $author;
                $status{state}{$working_file}{$revision} = $state;
                $status{lines}{$working_file}{$revision} = (defined($lines) ? 
$lines : "");
            } else {
                if (! defined($working_file)) {
                    warn "$0: parse_cvs_log: $file($.): no obvious working 
file?!?";
                }
                if (! defined($revision)) {
                    warn "$0: parse_cvs_log: $file($.): no obvious revision?!?";
                }
            }
            next;
        } else {
            $watch_next_line = "nothing";
        }

        if (m/^={77}$/) {
            reset;
            $working_file = undef;
            $total_revisions = undef;
            $watch_next_line = "nothing";
            $date = undef;
            $author = undef;
            $state = undef;
            $lines = undef;

            next;
        }

    }
    close(LOG);

    return(\%status);
}

sub System {
    print "@_\n";
    system(@_);
    if ($?) {
        die "$0: \"@_\" exitted abnormally, exit status $?\n";
    }
}

sub Mkdir {
    print "mkdir $_[0]\n";
    mkdir($_[0], 0755) || die "$0: mkdir($_[0]) failed: $!\n";
}

sub Chdir {
    print "chdir $_[0]\n";
    chdir($_[0]) || die "$0: chdir($_[0]) failed: $!\n";
}

sub diatribe {
    print <DATA>;
    exit(0);
}

__DATA__
Arguments against the current semantics of "cvs import":

"cvs import" assumes that you are trying to merge third party code into your
mainline at the time of the import.  It assumes that you can do this rather
fast (e.g. the mainline is corrupted until you are done with whatever merges
are necessary, and have committed them).  It assumes that you *want* to do
the merge, as opposed to simply recording the third party source on a branch
for convenient tracking/diff'ing/etc.

Fundamentally, "cvs import" as currently implemented is not compatible with
the idea (or attempts at the idea) of having commits being atomic.  Some
revision control systems (not CVS) have the idea of "change sets".
Translated to CVS-speak, a change set is roughly the equivalent of one
commit.  The basic idea of a change set is that you either have ALL the
changes in a given change set, or none of them.  You can never see "parts"
of a change set.  "cvs import" doesn't currently allow an easy emulation of
this behavior.

Specific evil behaviors of cvs import:

1) If "cvs import" is creating a new file in the repository, it puts a copy
   of the new file contents into revision 1.1 (i.e. on the mainline).  It
   really should have left the mainline alone until you merged this lastest
   import delta (i.e. "update -j lastreleasetag -j thisreleasetag") onto
   the mainline.  The mere existence of this new file on the mainline can
   create problems.  (Ideally, it should have labelled 1.1 with state
   "dead", and put the file into the Attic.)

2) If "cvs import" is creating a new file in the repository, it sets the
   default branch of that RCS file to be the vendor branch.  This is
   normally set back to the mainline when the first commit occurs to this
   file's mainline, but it means that until such a commit occurs,
   subsequent "cvs import"s will be visible in checkouts/updates of the
   mainline, which seems clearly wrong (as only *some* of the changes in
   these subsequent "cvs import"s are reflected into the mainline).

3) "cvs import" requires a numeric vendor branch of the form "X.Y.Z"; in
   general, it uses "1.1.1"...  If revision 1.1 doesn't exist (i.e. a
   pre-existing file, which doesn't have a revision 1.1 for whatever
   reason), it just fails.  While you can override the "1.1.1" default
   branch with the "import -b" option, you have to provide a numeric
   revision number; you can't use a symbolic branch name (wouldn't it make
   more sense if you provided the vendor tag?)!  Finding a single numeric
   revision number for an entire repository is not possible in general.

Proposed fix (for #1 and #2 only):
For each file newly created by a cvs import, do a "cvs delete" on that file.
That will cause its default branch to be set back to the mainline, and will
effectively remove the unwanted file contents from the mainline.

Question: How do you figure out which files were just created by the most
recent "cvs import", especially without capturing the output from that "cvs
import"?

Answer: It will be files which have exactly two revisions (1.1 and
1.1.1.1).  The timestamps on those revisions will match, the authors will
be the same, and the size of the 1.1.1.1 revision's deltatext will be zero.
Both revisions will be in state "Exp".  All of the tags provided by the
"cvs import" will be present and bound as expected (the vendor tag and
potentially numerous release tags).

Suggested Methodology for "cvs import":

1) Checkin/import of new vendor source code:

   cd <source-tree-from-external-source>
   cvs -d <cvsroot> import <flags> \
       <directory-within-repository> <vendortag> <releasetag...>

Think of the vendortag as a *branch* name.
The releasetags should be different for each import.

This import may or may not generate a message about how to merge conflicts.
Ignore that message.  Instead, you'll want to use the merges as described in
step #3, below (but don't skip step #2).

2) Repair the damage done to the repository by "cvs import":

   cvs-import-fixup <cvsroot> <directory-within-repository> \
       <vendortag> <releasetag...>

cvs-import-fixup is script that does the following:
  a) creates a temp area in /tmp
  b) does a "cvs -d <cvsroot> co <directory-within-repository>" in this
     temp area
  c) runs "cvs log", and parses the output to find the files that were
     created by this particular "cvs import"
  d) runs "cvs delete -f" on them
  e) runs "cvs commit"
  f) remove its temp area

3) Merge the new vendor code onto the mainline:

This can now be done if/when you want to...

If you are grabbing/merging the first "cvs import" of a given vendor branch
onto your mainline, do this:

   mkdir <new-clean-tmp-work-area2>
   cd <new-clean-tmp-work-area2>
   cvs -d <cvsroot> co -P -j 1.0 -j <releasetag> <directory-within-repository>

or equivalently, to merge that import into an existing work area, do this:

   cd <existing-work-area>
   cvs -d <cvsroot> update -d -P -j 1.0 -j <releasetag> \
       <directory-within-repository>


If you are merging a subsequent "cvs import" onto your mainline, do this:

   mkdir <new-clean-tmp-work-area2>
   cd <new-clean-tmp-work-area2>
   cvs -d <cvsroot> co -P -j <oldreleasetag> -j <newreleasetag> \
      <directory-within-repository>

or equivalently, to merge a subsequent import into an existing work area, do
this:

   cd <existing-work-area>
   cvs -d <cvsroot> update -d -P -j <oldreleasetag> -j <newreleasetag> \
      <directory-within-repository>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]