arx-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Arx-users] Further thoughts on ArX and simplicity


From: Walter Landry
Subject: Re: [Arx-users] Further thoughts on ArX and simplicity
Date: Mon, 18 Jul 2005 16:49:24 -0700 (PDT)

Kevin Smith <address@hidden> wrote:
> Walter Landry wrote:
> > Kevin Smith <address@hidden> wrote:
> > Hmm.  You've got me thinking.  For a lightweight branch, we could
> > store the URL of the main archive, but as a piece of changeable
> > metadata.  It would not be part of the revision, so it would not have
> > its hash computed.  It would just be lying around in the archive (in
> > the CONTINUATION file?), and tell you where this branch forked from.
> > It would just be an advisory thing, since in the end we would still
> > compute the hash of everything and check signatures.  But that is
> > really all that the archive names were anyway.  It could also be
> > changed when URL's change.
> > 
> > I think this would allow you to do everything you want, and still keep
> > me happy.
> 
> So the ArX archive registry would disappear.

Correct.

> Any command that currently takes an archive name would now take a
> URL.

That is already true.

> At some point we could invent a simple alias system, but that can
> wait. Is that correct so far?

Yep.

> Lightweight branches. First, I think we need a different term, because 
> other systems use that to mean a branch that's stored within the same 
> repository, as opposed to creating a new branch by creating a new 
> working directory (that contains that branch's repo). So I propose 
> something like "remote branches" or "distributed branches".
> 
> I really don't know much about remote branches, so you'll have to design 
> and perhaps implement anything in that area.

This process will require changing the archive format.  The places
that I can think of are

  1) Continuations

    In logs, the "Continuation-of" header tells us where this revision
    branched from.  This is used for all branching.  It currently has
    the archive and revision.  It could be changed to have just the
    revision, and then there would be a "FORK_URL" file which would
    have the url of the previous branch.  If the location of the
    archive changes, we only need to change that one file.  Putting it
    in a separate file means that it won't interfere with checksums.

  2) Tags

    Tags have the full archive/branch,revision name of what they refer
    to.  Again, we could make it just have the revision, and have a
    "TAG_URLS" file that contains all of the necessary urls.

  3) Logs

    The archive name is explicitly listed in the logs, and any new
    patches (e.g. from a merge) also have archive names.  We would get
    rid of these archive names completely.  You might think that it
    would make it harder to find where a patch comes from, since you
    no longer have the "address@hidden" information to guide you
    when googling.  However, you still have the creator's name, which
    is likely just as good.

With these changes, I don't think that we have to store archive names
anywhere.  However, there is now a large chance of getting conflicts,
because we no longer have the uniquifier "address@hidden" at
the beginning.  To make a usable system, I think we will have to
implement something like "hashes for revisions" [1].  In addition,
they will both change the archive format, and it would be good to get
all of the changes done at once.

Unfortunately, I have been thinking over the hashes for revisions
work, and I found one problem: we don't know what the hash will be
before we create the patch.  That means that we don't know how to name
the patch log.  Systems that don't support cherry-picking can get away
with it, because there is always a context for a log.

One solution is to not use checksums in the names, and instead use
random numbers.  This has the same collision-resistant properties of a
hash, but it doesn't have the self-verifying properties of a hash.
Normally, you don't care because you're checking crypto signatures.
But if your key is stolen, then an attacker could change revisions
that have already been published.

It might be possible to combine random numbers with hashes to get real
hash-based revisions, but whatever I might come up with will be an
ugly hack.  Just using random numbers will be rather straightforward.

> So how can we get from here to there? I'm a huge fan of incremental 
> development, so I would really like to have a series of changes where 
> the system is never broken.
> 
> Perhaps the first step would be to do the work described below. That 
> would dramatically cut down the number of places that would need to be 
> changed to reflect the big archive naming paradigm shift.
> 
> After that, could the remote branch work be done first (switching to be 
> URL-based), without affecting other parts of the system? If so, that 
> seems like a good first step.
> 
> Next I think we could allow existing commands and data files to accept 
> and store URL's where they currently take archive names. Once that's in 
> place, remove all cases where new archive names can be created by 
> existing commands.
> 
> The final cleanup would be migration scripts to eliminate legacy archive 
> registries, and switch any legacy archive names in data files over to 
> the new URL method.
> 
> Does that sound about right?

More or less, although the middle steps are intertwined.

Cheers,
Walter

[1] http://lists.gnu.org/archive/html/arx-users/2005-04/msg00000.html




reply via email to

[Prev in Thread] Current Thread [Next in Thread]