gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implement


From: Aaron Bentley
Subject: Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes
Date: Sun, 01 Feb 2004 00:27:46 -0500
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6b) Gecko/20031205 Thunderbird/0.4

Tom Lord wrote:

   >> so that client-side space requirements don't have to be the sum of the
   >> file sizes.

   > According to the thread, the aggregate storage requirement will
   > be roughly twice that of the equivalent DELTA.  I'd be willing
   > to make the sacrifice of storing that data temporarily to retain
   > the appearance of a single-threaded program.

I think that it varies wildly depending on the project.  For example,
I can believe that the compose of 100 linux kernel patches, mostly
working on different parts of the kernel, isn't much smaller than the
largest of those patches.   On the other hand, I'd expect the compose
of 100 tla patches to be much smaller than the sum of the 100.
The 50% number comes from using tla as the example, but doing it every 200 KiB, not every 100 revisions.

Also, one problem that is really worth solving in the medium term, I
think, is the "huge number of clients requesting `update' problem".
For example, the linux kernel is in arch, a gazillion people track the
arch sources pretty passively, the 2.9 release announcement come out,
and a gazillion people run `update'.  If 70% of those gazillion people
are asking for the same delta, and that delta is is even only 50%
smaller than the sum of the patches constituting it, that's a pretty
big savings in server bandwidth if the server can compose-and-cache
that delta.
I'm not arguing that deltas aren't good. But they *are* bigger than the changeset for a single revision. When deltas aren't possible, I think a case could be made for storing twice as much data, temporarily. But it depends on priorities.

> If you're really serious about this, I guess you could use callbacks to > compose and delete the changesets as they come in, but it looks hard to > me. What if the first changeset to complete retrieval is the one in > the middle?

Such non-determinism as you describe would be contained in the
pfs-{dav,sftp,...} layer.   I doubt there's any reason to really go
that far but if one did, it would be contained in that layer.
Depending on the pfs-level implementation, the first changeset may be the last to complete downloading. In that case, you'll have to burden the system with storing all the changesets at once anyway. It won't happen with pipelining, but it could happen with parallel downloads.

e.g.
patch1--------------------------------------------->complete
                1000K
patch2----patch3----patch4---->complete
      50K      100K      400K
And there are a number of in-between cases.

It's starting to look like compose_changesets should be the next "big
hard thing" I work on for tla.

(That and/or partial commits.)
I'm working on inode snapping for partial commits right now. Still, I tend to think partial commits aren't worth what they cost. It could be handled with scripted undo/redo.

(There's also the tree-delta stuff for which we can't borrow code.)


> I know not this "changeset-utils" of which you speak. Would this > include standalone tools for creating and applying changesets?

You know -- I did have to make a decision at one point about whether
to factor out mkpatch/dopatch and distribute it separately.    That
separate distribution would also want to include inventory.
Architecturally it makes perfect sense.
I imagine we could get tla to *behave* like mkpatch if argv[0] is "mkpatch" quite easily. "Should" is a different question, though.

Aaron




reply via email to

[Prev in Thread] Current Thread [Next in Thread]