[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Arx-users] Repo format take II
Re: [Arx-users] Repo format take II
Tue, 20 Dec 2005 09:54:23 -0500
Mozilla Thunderbird 1.0.7 (X11/20051011)
Walter Landry wrote:
Kevin Smith <address@hidden> wrote:
Walter Landry wrote:
At a glance, the formats of those directory names all look quite
similar. Would there be value in naming them such that you could tell
just by the name what kind of directory this is? Perhaps a leading 'C'
There is no room in the name of the skip delta for another character.
The two hashes take up 30 characters, and the skip level makes it 31.
For the other types, I don't think it will help that much to precede
them with a special character. The file names are a jumble of hex
Ok. So it's ok to have two completely different "types" of directories
using an identical naming scheme. Either you never care which "type" a
directory is, or it's ok to read the contents of a directory to
determine it's "type". I was just thinking that there might be times
(especially over slow remote links) where it would be useful to know
that you don't have to read any contents of that directory because it's
not an interesting type. I didn't have any specific cases in mind, but I
know in my own work it is usually helpful to have naming strategies that
allow type detection without extra reading.
Just curious why the T is at the end rather than the beginning. I can
imagine some possible reasons, but would like to know the real one(s).
No particular reason. It satisfies my internal sense of aesthetics,
but I don't feel strongly. The hashes are fixed length, while the
following string's length can be anything zero and up.
I'm not familiar with the term "terminal revision". Could something be
both a patch revision and a terminal revision at the same time?
Terminal revisions tell ArX that that revision should not be
considered when looking for HEAD.
> Actually, a "terminal revision" makes a terminated microbranch.
I still don't like terminal revisions, since it sounds too much like a
leaf revision, which would in fact be a HEAD. How about "terminating
revision". Or "dead revision" or "killed revision" or something that
reflects that this case only happens when the user kills it. Along those
lines, a name ending with X would make sense, although T is ok too, if
it's not needed for Tag.
The first part is the directory where the repo is, the second is the
project name, which is currently implemented as a directory
"project.d". But that last part is opaque to the user.
Ok. They still seem like distinct entities, but it seems fine to start
this way, at least.
Also, I would hope that the UI would allow a default repo so the user
would only have to specify the project name.
That is one of the things that I was hoping to get away from. I heard
too many complaints about default repos, and it always kind of grated
on me. I am not vetoing the idea, but I would like to see how things work
without them. Monotone manages to get by without them.
He creates more revisions. When he commits revision 32, that also
creates a skip-delta back to the first revision
I thought skip-deltas typically relied on the random creation of links.
I do not know what you mean here.
Doh! I was thinking of skip-lists, which are very similar, but which are
typically generated by inserting elements in a random sequence. Never mind.
Would this actually be hard-coded at 32?
That is my current thought. That makes the total required space
N(1+log(N)), which for 65000 revisions is about a factor of 4 bigger
than N. That is a worst case scenario, and I don't think that
Subversion has seen such big factors in practice. I will certainly
run tests on the gcc repo before setting anything in stone.
Also, with 32 that gives you 31+31+31+2=95 patches that you need for
revision 65535, and I wanted to keep that number under 100 (why 100?
No particular reason).
> 256*256=65536, and 60000 revisions is a design goal. So that would
> make two levels of 256 directories each, and should not degrade too
I know it really doesn't matter, but using numbers like 256 and 65000
add to the nerdy-ness of the project, which already has a nerds-only
inside-joke as a name (ArX). Unless there are compelling technical
reasons for using these binary numbers (which will at least be partly
exposed via documentation), I would prefer using numbers that are easier
for non-nerds to deal with, like 100 and 100000.
Random thought: Why not name the top-level directories 1, 2, 3 instead
of 0, 256, 512? There could be a value written in a repo config file
that indicates how many revisions go in each directory, so it's not
locked in stone for every repo on every underlying filing system for all
time. Or maybe 0000, 0001, ... so they are all the same length, and are
which just copies everything over. He periodically resyncs, and the
dirhash files mean that he only has to list directories that have
changed. He hacks by getting revisions out of his own repo,
committing, and merging.
arx get panza_repo,project project_tree
commit -m "cheaper, faster, better"
arx propagate /home/quixote/repo ../panza_repo
So "propogate" would merge at a repo level? It's late and I'm tired so
I'll assume I'm missing something.
I am not sure what you mean by "merge". It is not doing what monotone
does during "merge". Propagate just copies revisions arounnd.
Ah, the monotone model. So propagate would pull other folks' branches
into my repo, but wouldn't affect my own branches. What if I have two
repos and have worked on the "same branch" in both of them? I can see
that it could pull in my "other" revisions without conflict because they
are named by their hash. But wouldn't the sequence numbers collide? Last
time I looked at monotone, it didn't have sequence numbers.
Yep. Though I think relocate and propagate are two commands that
people are going to need to use.
Yes, but only after they are familiar with the basics. If I understand
relocate (and I don't think I do), it would only be used rarely.