gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implement


From: Tom Lord
Subject: Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes
Date: Sat, 31 Jan 2004 12:01:05 -0800 (PST)



    > From: Aaron Bentley <address@hidden>

    >> Please don't make the archive.h protocol asynchronous -- there's just
    >> no need for it.

    > I meant that you'd submit the jobs in batches, and let pfs-foo worry 
    > about the optimal way to implement them-- streaming (http?), parallel 
    > (ftp?), or serial (fs).

This is the crux of the nub.

    > e.g, you'd have
    > int pfs_get_files (struct arch_pfs_session * p, int num_files, int 
    > *out_fd, t_uchar ** path, int soft_errors)

    > It would return when all requests were satisfied or when one proved 
    > impossible.

Excellent.  So we have been using phrases like "streaming the archive
protocol" in different ways.  And what you suggest there is roughly
what I propose as well.  Slight difference: I think that pfs_get_files
should be asynchronous and let you retrieve the files one at a time so
that client-side space requirements don't have to be the sum of the
file sizes.

I think most of the confusion here is that we've got several different
protocol layers and it isn't always clear whose talking about which
ones when.

Ok, then -- just for the archives -- here's my summary of where I
think we are:

* Problem Statement

  build_revision, arch_arbitrary_delta (which star-merge could and
  should but doesn't happen happen to use), and udpate all need to solve
  what we could name the "related revisions delta problem".
  build_revision, get, and update are a special case.

  For example suppose that I have a version:

        base-0          # import revision
        patch-1         # normal changeset revision
        ...                  ...
        patch-K         # normal changeset revision
        ...                  ...
        patch-N         # normal changeset revision

  And I 

        tla get patch-K

  Absent revlib and cachedrev shortcuts, build_revision needs to:

        fetch and unpack base-0
        apply delta (base-0, patch-K) to that tree

  What it will currently do to accomplish that is:

        fetch and unpack base-0
        fetch patch-1
        apply patch-1
        fetch patch-2
        apply patch-2
        ...

  Similarly, if I then:

        tla update 

  to catch up to patch-N, that will:

        fetch patch-K+1
        apply patch-K+1
        fetch patch-K+2
        apply patch-K+2
        ...

  arbitrary_delta and star-merge solve a similar problem but in their
  case the needed cumulative-delta can cross boundaries between
  versions and isn't always a simple composition of changesets stored by
  commit.  

  There are three sources of inefficiency there:

** client side inefficiency

  build_revision and update are applying K and (N - K) patches 
  there.

** network latency penalty

  Each "fetch" involves a server roundtrip.   Whatever the latency
  cost for a roundtrip is for the server in question, we're 
  paying K or (N-K) times that latency penalty.


** network bandwidth penalty

  K or (N-K) changesets are being fetched -- but the odds are pretty
  good that the useful information in that list of changesets can
  be substantially compressed by composing them and shipping just
  one changeset.   Whatever the average size overhead for changeset
  is, we're needlessly paying (K-1) or (N-K-1) times that bandwidth
  penalty.



* Solution Space

  Roughly speaking, therefore, there are three big opportunities for
  speed-ups.

** reduce the number of changesets that need to be applied

  If build_revision and update can get by applying just _one_
  changeset rather than K and (N-K), that's a big speedup.
  (Of course, the cost of producing that single changeset must
  not exceed the benefits.)

** reduce the number of server roundtrips

  If build_revision and update can ask for all the changesets it 
  needs in a single archive transaction, our latency penalty 
  goes away.

** reduce the amount of data sent

  If a server happens to be able to "pre-compose" a list of
  changesets, we can make our bandwidth penalty go away.
  As a side effect, we'll also be reducing the number of changesets
  that need to be applied.


* Solution Implementation

  When we talk about build_revision and update, we're
  talking about code paths that look (essentially) like this:

        for (x = 0; x = n_patches_needed; ++x)
          {
            arch_get_patch (needed_patches[x]);
            apply_patch ([...]);
          }

   but there's a special property that we can count on.
   Specifically: 

        needed_patches[0]

   is an ancestor of:

        needed_patches[n_needed_patches - 1]

   In other words, the for loop above is equivalent to:

        patch = delta (needed_patches[0],
                       needed_patches[n_needed_patches - 1]);
        apply_patch (patch);

   and 
       delta (needed_patches[0],
              needed_patches[n_needed_patches - 1])

   is known to be the compose of the whole list of needed_patches.

   Which is handy for several reasons:

   1) it calls apply_patch only once -- thus it addresses our
      client side inefficiency

   2) _if_ a server can compute and return that delta directly,
      that completely solves our network latency and bandwidth
      issues.   It's exactly one roundtrip and the changeset returned
      is as compressed as it could possibly be.

      The caveat is, of course server-side costs but that's a
      worthwhile trade off in a number of circumstances.

   3) _even_if_ a server can _not_ compute and return that delta
      directly,  so long as the /underlying transport/ is streamy,
      if the archive protocol asks for that entire delta at once,
      the N patch fetches can be streamed and the results composed
      client-side.   That does nothing for the bandwidth problem,
      but it helps a lot with the latency problem and perfectly solves
      the client-side inefficiency.

      There's a caveat, of course: we realy only win here if composing
      the incoming patches can be done fast enough to keep up with
      them as they stream in.   Still -- that's a lot more likely to
      be the case than that we can make apply_changeset fast enough to
      keep up.
  

   4) _even_if_ a server can _not_ be streamed and can _not_ compute
      the delta server-side (and, native file system archives are in
      this category) -- the underlying transport can still fetch 
      the patches one at a time and compose them.   That addresses
      _only_ the client-side inefficiency but notice that native FS
      is the primary transport we're talking about here -- so latency
      should not be much of an issue in the first place.

   So I think the thing to do is to modify the archive protocol --
   libarch/archive.h to add a `delta' command.

   Initially, achive-pfs.c (which implements the protocol for all
   transports except arch-archd) can do nothing more than compose
   incoming patches -- that will address the client-side inefficiency.

   Later, the libarch/pfs.h vtable --- the list of functions which
   dumb-fs transports have to provide -- can be extended to include an
   asynchronous "multi-file fetch" command.  And archive-pfs.c can use
   that to stream the requests for the individual changesets.   
   That will address the latency issues.

   (The order of those two steps is variable.  If multi-file fetch
   shows up first, archive-pfs.c can take advantage of it in other
   ways.  Also note that a generic implementation of multi-file fetch
   can be provided that will work correctly (albeit without optimizing
   anything) for all pfs-*.c implementations -- so there's no prereq
   that it needs to be implemented separately for each dumb-fs
   transport before it can come on-line.)

   There isn't anything that archive-pfs.c can do at this stage
   for the bandwidth problem but, if the proposed archive-cachedelta
   command were added, then archive-pfs.c could take advantage of that
   in `delta' in order to help with bandwidth issues.

   Meanwhile:  The arch-archd protocol can support the DELTA protocol
   directly.  arch-archd doesn't work via the pfs layer.   It doesn't
   and shouldn't translate archive.h protocol into dumb fs requests.
   The simplest thing is to implement delta directly.

   Initially, an archd server can reply to DELTA just by providing a
   list of all the changesets -- the client side of archd can compose
   those.   That solves latency and client-side inefficiency.

   Later, perhaps, archd servers can have a "plugin" which can 
   sometimes compose some changesets server-side.   That solves
   bandwidth.


--------------------------------

About backwards building:

    >> I _suspect_ you'll find that it will take some tweaking to get the
    >> heuristics exactly right.  Ancestry-distance isn't a great heuristic
    >> -- you want to approximate weighting for archive latency and bandwidth
    >> too.

    > Sorry, I meant "making build_revision work backwards *without*
    > crossing tag boundaries" looks relatively easy.  In that case,
    > you're working with a single archive, so there are no
    > differences in latency.  Since this is a temporary stage, I
    > don't want to spend too much time on its heuristics.

That's fine.  Even within a single archive there's choices such as
when to use an archive cached rev vs. when to build-by-patching.

For example, untar'ing a cacherev from a local-fs archive is generally
faster than copying one from a library.   An untar plus a couple of
changeset applications may be preferable to copying a revision already
present in a library.



    > >    > Crossing tag boundaries will make the problem harder.  While tla
    > >    > implicitly uses the call stack to build forwards, crossing tag
    > >    > boundaries requires tla to map out several paths, and determining 
the
    > >    > best one will require a cost assessment based on

    > >    > 1. aggregate download size
    > >    > 2. download cost for a given archive

    > >Are you aware of the previous threads about these kinds of heursitics?
    > >I think it was mostly between miles and I.  If not, I can try to find
    > >it again.

    > I'd like to read it.  Is this it?
    > http://mail.gnu.org/archive/html/gnu-arch-users/2003-09/msg01419.html

That'd be the thread.  (And, btw, I didn't mean to slight others by
mentioning only Miles --- I just recalled that that would be a good
way to find the thread.)


    > Looks like.  5x to 25x speedups and half the size for "summary deltas" 
    > of 200 KiB does sound worthwile.  My concern is that the speedup on the 
    > client side might be produced by pushing that CPU cost onto the
    > server. 

A few things: summary deltas don't _have_ to computed server-side.
Part of the beauty of the idea is that it works out fine even on
dumb-fs servers if users can manage them explicitly (just like
cachedrevs).

    > Do we know that composing changesets will be significantly cheaper?  If 
    > they are, we should be doing it client-side too.

We _don't_ in an empirical sense know that composing changesets will
be significantly cheaper however I think it's a safe bet.  Changeset
size is (roughly, and only in the expected case) a function of the
amount of human work that goes into one commit -- relatively constant
across projects.   Tree-size is arbitrary.    Faced with some
O(tree-size) algorithms that can be avoided by changeset composition
there's two strategies: try to eliminate the O(tree-size) steps from
the algorithms or use changeset composition.    You and others have
done a lot of work so far on eliminating O(tree-size) steps (and
that's very good) --- the other path wants exploring, too.

It will be a little tricky.



    > Okay, but I think it should be fine to allow some or all requests to be 
    > batched.  This keeps the funky code isolated and low-level, leaving a 
    > pretty clear and usable interface.  

Right.  And I was freaking out because I thought you were saying that
that was exactly the direction you did not want to go.   Adding
arch_achive_delta to the archive.h protocol is a nice way to increase
the utility of pfs_get_files.

    >>> I'm not sure that there's any value in composing the changesets
    >>> before applying them, though.  It would probably be better to
    >>> say arch_archive_delta can return any number of changesets, and
    >>> apply them directly.

    >> It takes pressure off the need to find (not obviously perfectable)
    >> inventory-traversal-elimination optimizations in apply_changeset.

    > Perhaps it is ideally better.   But compose_changesets
    > - will be new, buggy code

So are the steps taken to avoid traversals in apply_changeset.

    > - affects a core function

So do all the alternatives.

    > - may, depending on implementation, require users to install
    > interdiff

It will add new code for diff composition to tla but I don't see any
good reason to add a new dependency.

    > - looks about as complex as those optomizations

I'm not sure I agree, just because patchset composition is more like a
math problem and less like a "what weird things can happen on the
filesystem" problem.

Also, you left out that compose_changeset is useful server side and
for `changeset-utils'.

-t




reply via email to

[Prev in Thread] Current Thread [Next in Thread]