savannah-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Savannah-hackers] address@hidden: revision control systems]


From: Richard Stallman
Subject: [Savannah-hackers] address@hidden: revision control systems]
Date: Fri, 18 Jan 2002 17:18:26 -0700 (MST)

Could some of you please look at `arch' and tell me
your technical opinion?  Does it have promise?
How does it compare with CVS and Subversion?

The second message says where you can get it.

Please respond, to me and all the recipients, if you
are willing to investigate this.  I want to know that someone
is doing it, that this has not been overlooked.

------- Start of forwarded message -------
Date: Fri, 28 Dec 2001 15:46:17 -0800 (PST)
From: Tom Lord <address@hidden>
To: address@hidden
Subject: revision control systems


I've written a free software source code management and revision
control system called `arch'.  I think `arch' compares well with
CVS and Subversion and some of the commercial competition.

Some quick highlights of the feature list are:

        + distributed databases -- each hacker or group can host their
          own branches.  There's a global (world wide) name-space for
          lines of development and revisions.  Branches can be formed
          from any repository to any other and merge operations can 
          span repository boundaries without needing to actually
          duplicate the full contents of a repository at each site.
          
        + fancy merging -- `arch' has support for various styles
          of history-sensitive branch merging.  The way branches
          and patch-sets interact with distributed repositories
          makes it practical to distribute the responsibilities
          for patch-review and merging.

        + renames handled -- of course file and directory renames
          are handled accurately.  So are symbolic links and file
          permissions.

        + unobtrusive operation -- `arch' is designed to stay out
          your way while making changes and rearranging files.  It 
          is designed to have a clean and self-documenting
          command-line interface having the finest characteristics of
          good Unix tools.

`arch' is, at its core, a collection of shell scripts and a tiny bit
of new C code.  It brings many classic shell-utils, FTP, diff, and
patch together and turns them into a distributed version control
system.  In spite of the simplicity, `arch' is not a toy: its quite
sophisticated and, in my opinion, elegant.  It captures the style of
diff/patch use that we used to use before remote-CVS took over the
world, fills in some gaps, and packages the whole deal behind a nice
(command line) user interface.  Competing RC systems are far more
complex than they need to be.

Enclosed below is a longer list of `arch' features.

Could you let me know if `arch' is interesting to you?  I'm trying to
find a commercial sponsor to help move it forward.  One obstacle I've
encountered is that arch is new so there isn't yet "enthusiastic
community support" for it -- a sort of chicken-and-egg problem.

`arch' is newer than other systems -- so it is less tested.  From a
hacking point of view, what I'd really want to be able to do is a few
months of intensive and focused testing and tuning, culminating in
applying it so some larger projects.

A user's guide for arch, describing most of the features and how to
use them, is available at:

        http://www.regexps.com/super-secret/arch.html

regards,
- -t


                 Key Features: Branching and Merging

* Fancy Tagging, Branching, and Merging

  `arch' is designed with unprecedented support for developing on
  branches and performing complex merges with automated assistance.

  Forming a branch (or tag) is inexpensive in both space and time.
  Tags are revisioned -- meaning that complete history is kept of how 
  a tag has been applied.

  For merging, `arch' provides a number of operations:

  `update': a `CVS'-style merge operator (diff the working copy
  against a common ancestor (from any branch) and apply those diffs to
  the latest revision).

  `replay': a `Subversion'-style history sensitive merge operator
  (apply to the working copy all deltas that are found in the latest
  revision (from any branch) but not previously applied to the working
  copy).

  `reconcile': an operation unique to `arch' which plans a
  multi-branch `replay'-based merge, finding an ordering of patches
  from those branches which minimizes sources of potential conflicts.

  `i-merge': another operation ("idempotent merge") unique to `arch':
  i-merge forms a revision whose delta from its ancestors consists
  entirely of merges with other branches (any combination of `update'
  and `replay').  `replay' and `update' can treat such deltas
  specially, skipping them for trees that have already undergone
  similar merges.  `i-merge' makes history-sensitive merging more
  effective and helps a team of programmers avoid having to repeatedly
  solve the same set of merge conflicts.  (The `i-merge' feature is
  the only one mentioned in this message not done yet.  Based on my
  experience implementing similar feature, `i-merge' needs 2-3 days to
  get working and pass initial testing.  I've postponed implementing
  it until I have a chance to work on `arch' full-time again -- using
  the planned feature as a kind of cognitive book-mark to recover my
  state after being away from the code for a few weeks.)

  `replay --exact' and `replay --list': operations which allow you to
  apply revision deltas in any user-selected order, while still taking
  advantage of history-sensitivity.

  `mkpatch' and `dopatch': `arch''s "next generation" replacements for
  `diff -r -c' and `patch'.  These can be used to perform arbitrary
  delta computation and applications on working copies.


* Directory and File Renames Handled Cleanly

  Changes are tracked across file and directory renames.  For example,
  if you have a local working directory and "update" against the
  repository (merge changes in the repository with local changes) -- and
  either or both the repository or your local tree has been
  "rearranged" -- the merge process takes those renames into account.
  As a practical matter, this creates an important new degree of
  freedom for developers: the freedom to "clean up" code by improving
  its organization without having to pay a high cost in revision
  control system maintenance.

  

                      Key Features: Repositories


* Distributed Revision Databases

  `arch' has a global (as in "world wide") name-space for revisions.

  `arch' seamlessly integrates all accessible revision repositories,
  both local and remote, into one large database.  Branches can span
  repository boundaries, etc.  That has big implications for open
  source processes, both intra-organizationally, and on a global
  scale.

  Each developer or organization can have a private database for
  day-to-day work, or for organization- or feature-specific branches.

  Loosely cooperating organizations can have separately administered
  repositories that, nevertheless, mutually support branching and
  merging.
  
  An unwelcome source of de-facto authority (hosting a public
  project's `CVS' repository) is undermined by `arch'.  More
  positively, `arch' lowers the barriers to coordinated
  inter-organizational development: if your repository is publicly
  readable, anybody can create branches -- there is no need to hand
  out write access to everyone who wants to play.


* Low Cost Server Administration

  `arch' remote repository access is via the FTP protocol.  An `arch'
  server can be a generic (unix-based) FTP server.

  Server administration requirements are minimal: databases can be
  created trivially and (unlike `CVS') never become wedged (except as
  a result of file system failures (or, sigh, bugs -- if there are
  any)).  Repositories can be easily migrated.  Repositories can be
  mirrored for read-only purposes.


* Atomic, Concurrent, Independent, and Durable Transactions

  Commits are atomic.  Concurrent commits to separate lines of
  development are permitted.  Commits are independent of "gets"
  (check-outs).  Commits are durable to the limits of the underlying
  file system.  If a commit hangs (say, a client dies) with locks 
  held -- those locks can be broken remotely.



                        Key Features: Logging
  
* Useful Semi-Automated Logging

  `arch' log entries contain lots of automatically generated
  information that is useful for browsing repository history and for
  performing intelligent (history sensitive) merges.


* Automatic ChangeLog Maintenance

  `arch' can automatically generate GNU-style ChangeLog files from
  revision control log entries.  If your tree contains automatically
  generated log files, `arch' will update them during `commit', and
  after every merge operation that changes a revision's patch history.


  
                     Key Features: User Interface

* Patch Set Browsing

  Any patch set, for a committed revision, between a working copy and
  its ancestors, or between arbitrary trees, can be summarized in an
  HTML-formatted report, with lists of renamed files and directories,
  and hyper-links to individual file deltas, added files, and removed
  files.  This is a boon to developers writing log entries and to
  patch reviewers.  One of my favorite commands has become: 

        netscape --remote "openURL(`arch what-changed --url`)"


* Command-line Driven, Self-Documenting

  `arch' is a collection of small and simple software tools.  The
  collection has very regular and thorough conventions for option
  names and defaulting behavior.  Every command has an extensive
  `--help' message describing its options and functionality.  The
  command `arch --help-commands' gives an orderly summary of all the
  commands available with brief descriptions of each.


* Far More GUI Work Possible

  `arch' is designed from the ground up to be layered under separately
  developed GUIs.  For example, `arch''s log entries contain enough
  information to drive a graph-drawing branch-merge graph of revision
  history, conveniently represented as plain-text data in RFC822-style
  message headers.



                  Key Features: Performance Metrics

  
* Pretty Fast, Efficient Use of Bandwidth, Effective Use of Disk Space

  `arch' seems to be pretty fast, and for good reasons.  Tree-deltas
  (patches) are exchanged with servers as compressed tar files.
  `arch' makes clever use of client-side caching.  On my
  (unremarkable) system, `commit' processes around 10 files per
  second.  (Rigorous comparative benchmarking and final tuning remains
  to done, however).


* Maintainable Size

  The heart of the implementation (around 30K lines) is (ahem) almost
  entirely shell scripts and awk code.  (This is not a joke -- `arch'
  is a serious system.)  In spite of the size and implementation
  languages, `arch' is more featureful than `CVS' and seems to be
  faster at common-case operations.


* Useful Subsets Small Enough to Add to Other Source Packages

  It is practical to distribute a tiny subset of `arch' with any of
  your source packages.  Contributors without repositories can use
  that subset to prepare `arch'-compatible patches or to apply `arch'
  patch sets.


regards
- -t
------- End of forwarded message -------


Date: Wed, 16 Jan 2002 01:34:23 -0800 (PST)
From: Tom Lord <address@hidden>
To: address@hidden
In-reply-to: <address@hidden> (message from Tom Lord
        on Sat, 29 Dec 2001 23:59:34 -0800 (PST))
Subject: Re: revision control systems




   Date: Sat, 29 Dec 2001 23:59:34 -0800 (PST)
   From: Tom Lord <address@hidden>



          Can you tell me where they can look at the source code?

   Not at the moment, but quite soon I hope.

   I was about to make a release after fixing some porting nits, then get
   caught up adding some new features.

   I'll re-ping you when it's ready.

   -t



I've made the first public release of `arch', a new revision control
system.  You can find it at:

        http://www.regexps.com

The user's guide is on-line, as is a simple repository browser for the
change history.  There's a read-only copy of arch's self-hosted
repository there too.  (Please let me know if the web pages give you
troubles on your particular browser -- this is the first time I've
tried using tables so heavily.)

Some of the key advantages of arch compared to CVS are:

        1. Atomic, whole-tree commits, reliable repository database.

        2. File and directory renames handled cleanly.

        3. Fancy features for branching and merging:

           For example, arch has a high level merge operator that is
           especially good for projects where multiple maintainers of
           a project each work on separate branches, merging to and
           from a shared "trunk" to stay in sync (the `star-merge'
           command, so called because the graph of trunk and branches
           has a star topology).

        4. Distributed repositories

           arch treats all accessible repositories as one big
           repository, permitting branch and merge operations to span
           repository boundaries.  "World-Wide Revision Control" :-)
           This eliminates the need for non-core contributors to
           resort to diff/patch and simplifies the change-review
           task for maintainers.

        5. Automatic ChangeLog maintenance.

        6. Configuration management for multi-package distributions.

        7. Weighs in at about 30K lines of code.

           (Some of the lines are rather wide, though :-)

arch is in pretty good shape in the sense that the core functionality
is done and I've been using it heavily, myself.  The main weaknesses,
and, hence, opportunities to contribute are:

        1. I use it only on a BSD-based system.  Though porting 
           to other platforms should be easy, it won't be a noop,
           and it hasn't been done yet.  

        2. Since revision control ought to be rock-solid reliable,
           a comprehensive test suite for arch is an important goal:
           but it's a large job. 

        3. The web interface and facilities for browsing revision
           history are a bit weaker than I'd like -- I'm working on
           that, though.

        4. No facility, yet, for automatically converting a CVS
           repository into an arch repository.

        5. No fancy GUI, yet, for drawing a graph that illustrates
           the branching and merging history of a project.

        6. No fancy GUI, yet, for running arch commands via a 
           control panel.

        7. For very large and/or active projects, some performance
           tuning is likely to be desirable.  I've been using arch on
           a tree with around 1500 files and find performance to be
           acceptable.  (By way of contrast, GCC has around 6500 files
           (at least in the old distribution I have on hand)).  I
           perform a small handful of commits per day (whereas (I
           presume) that across all branches, GCC gets at least
           dozens).  It is straightforward to speed up the arch
           commands that might cause problems -- they were written for
           simplicity and functionality first, omitting some obvious
           speed-ups.

-t




reply via email to

[Prev in Thread] Current Thread [Next in Thread]