gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implement


From: Tom Lord
Subject: Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes
Date: Fri, 30 Jan 2004 17:36:04 -0800 (PST)

    > From: Aaron Bentley <address@hidden>

    >> In general, streaming is going to require the client-side application
    >> logic to generate lots of requests before consuming the data they
    >> return.  That's awkward, client-side, and there's only a few special
    >> cases where it would be worthwhile.  More importantly: I think that
    >> those special cases will most often have better solutions than
    >> (literal) streaming.

    > I disagree: solving the latency problem for archd is nice, but
    > solving it for all protocols is nicer.  And I think we'll I'll
    > get close to that functionality in the backwards-builder anyhow.

Um... I _think_ you're talking nonsense.  The backwards-builder is
nice but it's orthogonal to what we're talking about.

"Solving for all protocols [using streaming]" would mean that the
revision builder, whether forwards or backwards, would have to issue N
getpatch requests all at once, then read N replies after that.

Knowing what N revisions to request is no big deal.   You figure that
out up-front going forwards or backwards.

But how do I process those N incoming replies?  One idea is that I
read each one, apply it, then read the next.  Another idea is that I
read them all up front, store them away, and then apply them one at a
time.  Both ideas are losing for two reasons: (1) you _can't_ do
server-side changeset composition that way; (2) you're making N calls
to apply_changeset.  The first of the two ideas is additionally lame
because, over TCP, at least, it won't really be streaming after all.

The third idea is that you add a DELTA function to the archive.h
vtable.  _That_ solves the problem for all protocols.  It can truly
stream to dumb-fs'es just fine (assuming that compose changesets isn't
_too_ expensive).  It can additionally benefit from server-side
changeset composition by a smart server.


    > Making build_revision work backwards looks relatively easy.
    > Just find out how many revisions away you have a library or a
    > cacherev, and build in that direction.

Sure.  Glad to hear it.  Never expected anything different.   

I _suspect_ you'll find that it will take some tweaking to get the
heuristics exactly right.  Ancestry-distance isn't a great heuristic
-- you want to approximate weighting for archive latency and bandwidth
too.


    > Crossing tag boundaries will make the problem harder.  While tla
    > implicitly uses the call stack to build forwards, crossing tag
    > boundaries requires tla to map out several paths, and determining the
    > best one will require a cost assessment based on

    > 1. aggregate download size
    > 2. download cost for a given archive

Are you aware of the previous threads about these kinds of heursitics?
I think it was mostly between miles and I.  If not, I can try to find
it again.


    > The archd pfs can merge the requests for "patch-1, patch-2, patch-3"
    > into "delta patch-1 patch-3" with little difficulty.  Since pfs-archd
    > will be alone in supporting this functionality, it makes sense to
    > special-case for archd instead of special-casing the current supported
    > protocols.

Please don't make the archive.h protocol asynchronous -- there's just
no need for it.

    > >     If from-revision is not * or
    > >     is not the immediate ancestor of to-revision, then implementations
    > >     MAY instead return an error.

    > I don't believe this provision is required.  Instead:

    > >     If from-revision is not * or is not the immediate ancestor of 
to-revision,
    > >     the server may return more than 
    > >     one changeset.   The composition of the changesets returned
    > >     describes the differences between the two revisions.

    > Hmm.  Looks awfully like streaming to me.

Yes, _like_ streaming in its benefits but _not_ streaming in how it
works.

Maybe we're just using words differently.

To me, streaming refers to a situation where there's a request/reply
dialog between client and server but it's an asynchronous protocol
rather than a synchronous one.   In particular, the client will send a
whole bunch of requests before reading any answers.

None of the underlying protocols (including archd) preclude streaming
in any serious way.   They're all streamable.

The question is, though, should streaming be exposed as a property of
the archive.h vtable and the answer is "no".


    > >     The client MAY include a Parts-limit header containing a single,
    > >     postivie integer.   The server MUST NOT reply with a greater
    > >     number of changesets than that.

    > I don't understand the motivation here.  Is it to avoid biting off more
    > than you can chew, bandwidth or storage-wise?  (If so, wouldn't an
    > Aggregate-size-limit header be better?)


The intention, actually, is to support the kind of build_revision
heuristics that you're working on.

For example, I might have everything I need in a local revision
library to compute a given DELTA.

On the other hand, I'm also talking to a server who can provide that
DELTA.

I can estimate that receiving and composing N changesets from the
server costs almost as much as computing the delta directly from my
revlib.   So I can ask the server "can you give me this delta in less
than N changesets?"




    > pfs-sftp, pfs-http, pfs-dav etc. can use arch_compose_changesets plus
    > streaming or multiple simultaneous connections to implement
    > arch_archive_delta.

That's exactly correct.   At the same time, streaming is _not_ thereby
exposed in the archive.h vtable.


    > I'm not sure that there's any value in composing the changesets
    > before applying them, though.  It would probably be better to
    > say arch_archive_delta can return any number of changesets, and
    > apply them directly.

It takes pressure off the need to find (not obviously perfectable)
inventory-traversal-elimination optimizations in apply_changeset.  

It also eliminates the need to buffer incoming changesets if
apply_changeset is too slow.   If apply_changeset is too slow, you
don't _really_ get streaming at all as soon your kernels buffers for
the socket fill up.


    >> Other merge commands can take good advantage of
    >> arch_compose_changesets as well.

    > If you're thinking of tla delta, I suspect that
    > compose-changesets will be implemented in terms of replay and
    > make-changeset, not the other way around.

That's just completely wrong.   `replay' needs a tree to operate on.
compose-changesets most certainly does not.


    >> The advantage of this approach over streaming is that it can be
    >> implemented in two ways (or a mix of two ways): Changeset composition
    >> can take place either client-side or server-side.

    > But why would we want to merge changesets on the client side before
    > applying them?

Hopefully the above has made that clearer.


-t





reply via email to

[Prev in Thread] Current Thread [Next in Thread]