monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: monotone CVS import failed.


From: Jon Smirl
Subject: Re: [Monotone-devel] Re: monotone CVS import failed.
Date: Sat, 28 Oct 2006 19:56:19 -0400

On 10/28/06, Markus Schiltknecht <address@hidden> wrote:
Jon Smirl wrote:
> Outside of a pack file git really does store snapshots. Each file in
> the snapshot is compressed with zlib. The snapshots are smart enough
> to share identical files using the sha1. There are no deltas.
>
> Inside a pack file git takes all the revisions of a file and sorts
> them from largest to smallest in size. It then xdiffs these and stores
> the deltas. There are no change sets or time sequential ordering.
>
> The directory trees are always snapshots, they just refer to the sha1
> of the expanded revisions. Inside a pack file the directories can
> sorted and xdiffed like the revisions, but it doesn't help that much.
>
> If you want a change set it can be constructed by taking two snapshots
> and diffing them. git provides tools for this.

Thank you for clarifying that. I've just read about git internals. I'm
surprised they store full files and am wondering if monotone could do
something similar, just for caching the data...

That is part of what makes git so fast. If you want the disk space
back just run git-repack, it only takes a few minutes once every
couple of days.

The idea is to keep the stuff you are actively looking at in
semi-expanded storage (zlib compressed but not deltaed).

Or maybe that has to do with the delta vs. reverse delta vs. deltas with
full files after 100 deltas discussion which netsync speedup has brought
up. There is a wiki page about it, but I didn't find it...

I don't know the exact wire protocol, but in part of it the client
tells the server the sha1 of some of the commits that it has. The lets
the server know which files the client already has. The server then
generates a mini-pack file by generating deltas against the versions
the client is known to have. It sends this mini-pack to the client in
a big lump.

Note that none of this is change set based. It is all snapshot based.
The server is just smart enough to compress some of the files being
sent by saying, apply this delta against this sha1 that you already
have to get the file you need for the incoming snapshot. When the
client runs git-repack the deltas may get computed totally
differently.

Same thing happens in the reverse direction to achieve the peer to peer sync.


Thinking about the full files as cached data might help. (?)

> By introducing symbol dependencies (which svn2cvs does not do) you can
> force the second change set sequence to be generated.

Uhm.. is there a reason for not adding such symbol dependencies? It
seems obvious that branching and tagging should be handled by the
toposort as well.

One reason is that the symbol dependencies are not clear from looking
at a CVS file in isolation. You need to take into account all of the
files to figure out the right dependencies.

Another reason is that the SVN people do not view this as being a
problem. They are ok with creating symbols and branches with lots of
copying from different change sets.

In monotone's cvs_import I'm adding such edges to the graph to be
sorted, I just have troubles because I have to find out what branch
symbols belong to *before* toposorting...

That is one of the hard parts.

I've been thinking a lot about this chicken-and-egg problem. Currently
I'm tempted to try to add another graph of just all the symbols and
branches (but not the commits). Toposorting this one would allow me to
clearly assign a tag or sub-branch to a certain branch. I don't know if
it's worth it, though. The simple alternative would be to throw it all
into one huge Graph.

No one has built a system for doing this yet so you are a pioneer. Try
several schemes and see which one works.

--
Jon Smirl
address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]