arx-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Arx-users] Further thoughts on ArX and simplicity


From: Kevin Smith
Subject: Re: [Arx-users] Further thoughts on ArX and simplicity
Date: Mon, 18 Jul 2005 22:06:46 -0400
User-agent: Mozilla Thunderbird 1.0.2 (X11/20050404)

Walter Landry wrote:

This process will require changing the archive format.  The places
that I can think of are

  1) Continuations

    In logs, the "Continuation-of" header tells us where this revision
    branched from.  This is used for all branching.  It currently has
    the archive and revision.  It could be changed to have just the
    revision, and then there would be a "FORK_URL" file which would
    have the url of the previous branch.  If the location of the
    archive changes, we only need to change that one file.  Putting it
    in a separate file means that it won't interfere with checksums.

Can you point to any design docs that describe the existing archive format? Poking around a bit, it looks like everything is stored binary (ick). What are the main types of files in _arx, and at a very high level, what are their intents and contents? Such information, even if it's just an email posted to the list, will be a valuable resource for anyone interested in fiddling with the code.

Obviously a "log" here is not what I think of as a "log". Perhaps we could come up with a better name for it to avoid confusion.

  2) Tags

    Tags have the full archive/branch,revision name of what they refer
    to.  Again, we could make it just have the revision, and have a
    "TAG_URLS" file that contains all of the necessary urls.

  3) Logs

    The archive name is explicitly listed in the logs, and any new
    patches (e.g. from a merge) also have archive names.  We would get
    rid of these archive names completely.  You might think that it
    would make it harder to find where a patch comes from, since you
    no longer have the "address@hidden" information to guide you
    when googling.  However, you still have the creator's name, which
    is likely just as good.

Hm.

With these changes, I don't think that we have to store archive names
anywhere.  However, there is now a large chance of getting conflicts,
because we no longer have the uniquifier "address@hidden" at
the beginning.

Ugh.

To make a usable system, I think we will have to
implement something like "hashes for revisions" [1].  In addition,
they will both change the archive format, and it would be good to get
all of the changes done at once.

I agree that if it's necessary to not track archives, then using "revision hashes" of some form makes sense. It seems to be all the rage among the kids these days :-)

Unfortunately, I have been thinking over the hashes for revisions
work, and I found one problem: we don't know what the hash will be
before we create the patch.  That means that we don't know how to name
the patch log.  Systems that don't support cherry-picking can get away
with it, because there is always a context for a log.

Can you describe why we need to name the patch log before creating the patch?

One solution is to not use checksums in the names, and instead use
random numbers.  This has the same collision-resistant properties of a
hash, but it doesn't have the self-verifying properties of a hash.
Normally, you don't care because you're checking crypto signatures.
But if your key is stolen, then an attacker could change revisions
that have already been published.

It might be possible to combine random numbers with hashes to get real
hash-based revisions, but whatever I might come up with will be an
ugly hack.  Just using random numbers will be rather straightforward.

Hm. Random numbers seem slightly more prone to collision (due to bad random number generators or insufficient entropy). Probably ok if they are long enough. I still want to really understand the stuff above so I can see why this is necessary.

I guess my hope would be that my "fresh eyes" might be able to spot some potential design simplifications, where your years in the arch/ArX world may have biased you in certain directions.

So how can we get from here to there? I'm a huge fan of incremental development, so I would really like to have a series of changes where the system is never broken.

Perhaps the first step would be to do the work described below. That would dramatically cut down the number of places that would need to be changed to reflect the big archive naming paradigm shift.

When I said "work below" I was referring to simplifying the UI to use appropriate defaults and inferences to avoid every possible need for the user to specify an archive. I think that's a big win even if we never take the additional steps we're discussing.

After that, could the remote branch work be done first (switching to be URL-based), without affecting other parts of the system? If so, that seems like a good first step.

I would love to figure out a way to implement remote branching without having to overhaul the archive format.

Next I think we could allow existing commands and data files to accept and store URL's where they currently take archive names. Once that's in place, remove all cases where new archive names can be created by existing commands.

The final cleanup would be migration scripts to eliminate legacy archive registries, and switch any legacy archive names in data files over to the new URL method.

Does that sound about right?


More or less, although the middle steps are intertwined.

Ok. Thanks.

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]