Re: [Arx-users] Repo format take II

arx-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Arx-users] Repo format take II

From:	Walter Landry
Subject:	Re: [Arx-users] Repo format take II
Date:	Tue, 20 Dec 2005 00:30:07 -0800 (PST)

Kevin Smith <address@hidden> wrote:
> Walter Landry wrote:
> > 
> > 1) Cached revisions.  These are directories named by the short hash
> >    and sequence number N.
> 
> > 3) Skip-delta.  These are one-way patch files (no logs) named by the
> >    short hashes of the beginning <hash A> and finishing <hash B>
> >    revisions and a number N indicating how many patches the
> >    skip-deltas encompass.  
> 
> At a glance, the formats of those directory names all look quite 
> similar. Would there be value in naming them such that you could tell 
> just by the name what kind of directory this is? Perhaps a leading 'C' 
> or 'S'?

There is no room in the name of the skip delta for another character.
The two hashes take up 30 characters, and the skip level makes it 31.

For the other types, I don't think it will help that much to precede
them with a special character.  The file names are a jumble of hex
characters already.

> > 4) Terminal revisions.  These are empty files named by the short
> >    hashes of the terminal revision, followed by the character T.
> > 
> >    repo/branch/0/<hash>T
> 
> Just curious why the T is at the end rather than the beginning. I can 
> imagine some possible reasons, but would like to know the real one(s).

No particular reason.  It satisfies my internal sense of aesthetics,
but I don't feel strongly.  The hashes are fixed length, while the
following string's length can be anything zero and up.

> I'm not familiar with the term "terminal revision". Could something be 
> both a patch revision and a terminal revision at the same time?

Terminal revisions tell ArX that that revision should not be
considered when looking for HEAD.

> > 5) Sequence revisions.  These are empty files named by the short hash
> >    of the revision, followed by a "N", and then a number N indicating
> >    the sequence number of that revision.
> > 
> >    repo/branch/0/<hash>N<N>
> 
> Same questions as for terminal revisions.

Sequence revisions tell ArX the numbering of a revision.  They do not
have any data in them.

> > 6) Tag revisions.  These are directories named like patch revisions,
> >    but inside there is log, log.sig, and URL.  The log file has all of
> >    the hashes and directory locations for tagged branches.  It does
> >    not have the hash of the revision.  The hash of the revision is
> >    actually the hash of the log file itself.  The URL file contains a
> >    serialized list of complete branch names for the tagged branches.
> >    If the branch is stored inside the same repo, then the repo part is
> >    ommitted.
> 
> Again, is there value in knowing it's a tag just by the directory name?

Yes, there is.  "T" would be the obvious choice, but then I would have
to change terminal revisions to something else.  Maybe "X"?

However, I just realized that tag revisions are not really patches
from one revision to the next.  So they should really be named with
just the revision hash.  But then they would also need a sequence
number to order them.  Yick.  I need to think about this a little
more.

> > ==========
> > 
> > In order to make sure that I covered everything, I wrote up what
> > happens when when Don Quixote and his sidekick Sancho Panza work on a
> > project.
> > 
> > So to start, Quixote creates a repo
> > 
> >   arx make-repo --key "address@hidden" repo
> 
> Just to be clear, wouldn't a first-time user more typically specify 
> ~/arx/repo or /var/arx/repo?

I think a first time user would want to keep everything in the same
place.  More advanced users start worrying about sharing repos, in
which case the repo might go in ~/public_html/repo.

> > this creates the directories and files
> > 
> >   repo/
> >   repo/keys
> >   repo/README
> >   repo/dirhash
> >   repe/pending/
> > 
> > Now he imports a project
> > 
> >   cd project_tree
> >   arx init ../repo,project
> >   arx commit -m "Initial import"
> 
> So (assuming what I said above) the init would more typically be:
>    arx init ~/arx/repo,project

Correct.

> I'm not thrilled by the , separating two distinct entities. Both are 
> just filenames, right?

The first part is the directory where the repo is, the second is the
project name, which is currently implemented as a directory
"project.d".  But that last part is opaque to the user.

> Also, I would hope that the UI would allow a default repo so the user 
> would only have to specify the project name.

That is one of the things that I was hoping to get away from.  I heard
too many complaints about default repos, and it always kind of grated
on me.  I am not vetoing the idea, but I would like to see how things work
without them.  Monotone manages to get by without them.

> > He creates more revisions.  When he commits revision 32, that also
> > creates a skip-delta back to the first revision
> 
> I thought skip-deltas typically relied on the random creation of links. 

I do not know what you mean here.

> Would this actually be hard-coded at 32?

That is my current thought.  That makes the total required space
N(1+log(N)), which for 65000 revisions is about a factor of 4 bigger
than N.  That is a worst case scenario, and I don't think that
Subversion has seen such big factors in practice.  I will certainly
run tests on the gcc repo before setting anything in stone.

Also, with 32 that gives you 31+31+31+2=95 patches that you need for
revision 65535, and I wanted to keep that number under 100 (why 100?
No particular reason).

However, once the number if chosen, it is pretty much set in stone.

> > When the number of patches from the beginning gets to 256, ArX creates
> > the directory
> > 
> >   repo/project.d/256
> 
> Any particular reason for 256 instead of 100 or 1000?

256*256=65536, and 60000 revisions is a design goal.  So that would
make two levels of 256 directories each, and should not degrade too
much.

I also considered three levels of 32 revisions each, but that seemed
like it would kill us in latency.

But I definitely plan to do some testing before I settle on any number.

> > and any skip-deltas encompassing it.  This only works if it is at the
> > tip of revisions.  He works on some mildly experimental stuff, but it
> > does not work out.  So he terminates that microbranch
> > 
> >   arx terminate ../repo,address@hidden
> 
> Not sure about that command name, but that discussion can come later.
> 
> > 
> > which creates an empty file
> > 
> >   repo/project.d/256/abcabcabcabcabcT
> 
> Oh, so a "terminal revision" is actually a "terminated revision". I was 
> thinking terminal==leaf.

Actually, a "terminal revision" makes a terminated microbranch.

> > Don Quixote's trusty sidekick, Sancho Panza, decides to branch his
> > repo.  At first, he just mirrors the entire repo
> > 
> >   arx make-repo panza_repo
> >   arx propagate /home/quixote/repo panza_repo
> 
> Again, command name choice discussions can wait.

That is straight from monotone.  Monotone only uses propagate within a
repo, while ArX would use it between repos as well.

> > which just copies everything over.  He periodically resyncs, and the
> > dirhash files mean that he only has to list directories that have
> > changed.  He hacks by getting revisions out of his own repo,
> > committing, and merging.
> > 
> >   arx get panza_repo,project project_tree
> >   cd project_tree
> >   <hack>
> >   commit -m "cheaper, faster, better"
> >   arx propagate /home/quixote/repo ../panza_repo
> >   arx merge
> 
> So "propogate" would merge at a repo level? It's late and I'm tired so 
> I'll assume I'm missing something.

I am not sure what you mean by "merge".  It is not doing what monotone
does during "merge".  Propagate just copies revisions arounnd.

> > and all of the other revisions.  Sancho then tells his project tree
> > about the branch movement
> > 
> >   arx relocate ../panza_repo
> 
> Hmm.
> 
> (snippage)
> 
> Overall, I feel a bit overwhelmed by the UI in your examples. I suspect 
> it's a combination of:
> 
> 1. The example itself intentionally covers some unusual cases

Yep.  Though I think relocate and propagate are two commands that
people are going to need to use.

> 2. The commands are different from ArX and every other SCM on the planet

Some of the commands are just not possible in other systems
(terminate, relocate).  Supporting remote repos with no-history
branches makes things interesting.

> 3. Did I mention I'm tired?
> 
> Back to the actual topic: I don't have any other useful comments on the 
> repo structure. Based on my tiny knowlege of such things, it seems sane.

Cheers,
Walter

[Prev in Thread]

Current Thread

[Next in Thread]

[Arx-users] Repo format take II, Walter Landry, 2005/12/13
- Re: [Arx-users] Repo format take II, Kevin Smith, 2005/12/18
  - Re: [Arx-users] Repo format take II, Walter Landry <=
    - Re: [Arx-users] Repo format take II, Kevin Smith, 2005/12/20
    - Re: [Arx-users] Repo format take II, Walter Landry, 2005/12/21
- Re: [Arx-users] Repo format take II, Walter Landry, 2005/12/20

Prev by Date: Re: [Arx-users] Repo format take II
Next by Date: Re: [Arx-users] Repo format take II
Previous by thread: Re: [Arx-users] Repo format take II
Next by thread: Re: [Arx-users] Repo format take II
Index(es):
- Date
- Thread