[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Arx-users] Repo format take II

From: Walter Landry
Subject: Re: [Arx-users] Repo format take II
Date: Wed, 21 Dec 2005 16:29:16 -0800 (PST)

Kevin Smith <address@hidden> wrote:
> Walter Landry wrote:
> > Kevin Smith <address@hidden> wrote:
> > 
> >>Walter Landry wrote:
> >>
> >>At a glance, the formats of those directory names all look quite 
> >>similar. Would there be value in naming them such that you could tell 
> >>just by the name what kind of directory this is? Perhaps a leading 'C' 
> >>or 'S'?
> > 
> > There is no room in the name of the skip delta for another character.
> > The two hashes take up 30 characters, and the skip level makes it 31.
> > 
> > For the other types, I don't think it will help that much to precede
> > them with a special character.  The file names are a jumble of hex
> > characters already.
> Ok. So it's ok to have two completely different "types" of directories 
> using an identical naming scheme.

We must have misunderstood each other.  In my previous reply, I
thought you were worried about someone trying to figure out how ArX
stores its data by inspecting the directories.  It is certainly a
design goal to be able to tell exactly what type of directory it is
without looking inside.  Tag revisions lacking a postfix "T" was a bug
in that regard.  To be specific, the formats are

1) <hash><N>
2) <hash B><hash A>
3) <hash B><hash A><N>
4) <hash>X
5) <hash>N<N>
6) <hash B><hash A>T

where <hash> is a 15 character hex string, <N> is a number, and
everything else is a literal string.  It all looks differentiable to
me as long as no one makes more than 10^14 revisions.  It may just be
difficult for a newcomer who does not know the formats.

I decided to keep the tag revisions as a "from" and "to" hash, since
when you make a new tag to override an old tag, it is useful to know
which tag you are overriding.

> >>I'm not familiar with the term "terminal revision". Could something be 
> >>both a patch revision and a terminal revision at the same time?
> > 
> > Terminal revisions tell ArX that that revision should not be
> > considered when looking for HEAD.
>  >
>  > Actually, a "terminal revision" makes a terminated microbranch.
>  >
> I still don't like terminal revisions, since it sounds too much like a 
> leaf revision, which would in fact be a HEAD. How about "terminating 
> revision". Or "dead revision" or "killed revision" or something that 
> reflects that this case only happens when the user kills it. Along those 
> lines, a name ending with X would make sense, although T is ok too, if 
> it's not needed for Tag.

I agree that "terminating revision" is better.  "Terminal revision",
"dead revision" and "killed revision" make it sound like it is the
revision that is killed, not the microbranch.

>  > 256*256=65536, and 60000 revisions is a design goal.  So that would
>  > make two levels of 256 directories each, and should not degrade too
>  > much.
>  >
> I know it really doesn't matter, but using numbers like 256 and 65000 
> add to the nerdy-ness of the project, which already has a nerds-only 
> inside-joke as a name (ArX). Unless there are compelling technical 
> reasons for using these binary numbers (which will at least be partly 
> exposed via documentation), I would prefer using numbers that are easier 
> for non-nerds to deal with, like 100 and 100000.

These numbers should never be exposed.  As for technical reasons, it
is in some ways simpler to think about and deal with powers of two.
It is not a real significant difference, but I do feel more
comfortable using them.

But the benchmarks will tell me what numbers to use.

> Random thought: Why not name the top-level directories 1, 2, 3 instead 
> of 0, 256, 512? There could be a value written in a repo config file 
> that indicates how many revisions go in each directory, so it's not 
> locked in stone for every repo on every underlying filing system for all 
> time. Or maybe 0000, 0001, ... so they are all the same length, and are 
> sortable.

Using 0, 256, 512, etc. seems like a simple, transparent way of naming
directories.  Using numbers like 1, 2, 3 etc. adds another layer of
indirection that I do not see the need for.  Are you thinking of
making the number of revisions per directory configurable?  I am
trying to come up with a one-size-fits-all format, so no one has to
think about how to create repos.  Benchmarks will tell me whether I am

> >>>which just copies everything over.  He periodically resyncs, and the
> >>>dirhash files mean that he only has to list directories that have
> >>>changed.  He hacks by getting revisions out of his own repo,
> >>>committing, and merging.
> >>>
> >>>  arx get panza_repo,project project_tree
> >>>  cd project_tree
> >>>  <hack>
> >>>  commit -m "cheaper, faster, better"
> >>>  arx propagate /home/quixote/repo ../panza_repo
> >>>  arx merge
> >>
> >>So "propogate" would merge at a repo level? It's late and I'm tired so 
> >>I'll assume I'm missing something.
> > 
> > 
> > I am not sure what you mean by "merge".  It is not doing what monotone
> > does during "merge".  Propagate just copies revisions arounnd.
> Ah, the monotone model. So propagate would pull other folks' branches 
> into my repo, but wouldn't affect my own branches. What if I have two 
> repos and have worked on the "same branch" in both of them? I can see 
> that it could pull in my "other" revisions without conflict because they 
> are named by their hash. But wouldn't the sequence numbers collide? Last 
> time I looked at monotone, it didn't have sequence numbers.

Yes, the sequence numbers collide.  But that is ok, because the hashes
will not.  The sequence numbers are not like in bzr, where each repo
defines its own sequence numbers.  Sequence numbers are global, but
they have an attached hash to disambiguate.  The idea is that for most
normal operations, the sequence numbers will not collide.

> > Yep.  Though I think relocate and propagate are two commands that
> > people are going to need to use.
> Yes, but only after they are familiar with the basics. If I understand 
> relocate (and I don't think I do), it would only be used rarely.

If you want to work on someone else's project without downloading the
whole history, then you have to use relocate.  Relocate would replace
the functionality currently in "fork".


reply via email to

[Prev in Thread] Current Thread [Next in Thread]