[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnu-arch-users] [BUG] FEATURE PLANS: "perfect" summary deltas
From: |
Aaron Bentley |
Subject: |
Re: [Gnu-arch-users] [BUG] FEATURE PLANS: "perfect" summary deltas |
Date: |
Sat, 10 Jul 2004 14:08:22 -0400 |
User-agent: |
Mozilla Thunderbird 0.5 (X11/20040306) |
Tom Lord wrote:
aaron's alternative idea (the "separate delta directory") also aims
to minimize roundtrips: just get a listing of that one directory
and now you know a big chunk of the graph. First: I think it would
interact poorly with smart servers because a smart server may be
willing to offer up a complete graph of deltas (so, while we only
have one roundtrip, the bandwidth can start to look pretty
interesting in large versions).
The archive.h-level stuff needs to be of the form "what are you willing
to give me that can get me from here to here." A smart server needs to
decide at that point whether it's willing to construct a given delta.
If not, it can reply with anything relevent to the request. If it is
willing to provide the delta, it answers with a succinct "I can give you
exactly what you need".
Second: I think it would interact
poorly with smart servers because it requires smart servers to
eagerly describe what deltas are available rather than seeing if,
upon demand for a specific delta, it's handy to provide it.
It is necessary for the smart server to decide at query time (or
earlier) whether it can provide a given delta. A builder needs that
information to determine the best path.
3. Do we or do we not muck with the archive format?
Actually, I shouldn't say "archive format".
Do we or do we not muck with the archive _abstraction_ because,
with a few exceptions (like checksums and signatures) when we
change the archive format, we're implying a change to the archive
abstraction.
Without some fancy footwork, some of Abentley's ideas would change
the archive abstraction in some deep ways that impact things such
as what smart servers can do. I see a negative impact in the
particulars.
Yeah, and the way I see it, you're trying to change the archive
abstraction without changing the archive abstraction, which means the
builder has to do the generalization instead (deleting log files, for
example).
The "perfect" summary delta doesn't change the archive abstraction
at all.
The way I see it, you've distorted the meaning of a version in order to
use it for storing semi-arbitrary deltas. It's like shoving utf-8
through an interface that was designed for ASCII-- its ugly and harder
to work with than the true representation.
A. I dropped "commit --base" (aka "commit --tag") in tla and
therefore, the current builder knows nothing about it.
Yeah, but on the other hand, the archive format doesn't distinguish
commit --base from a tag commit. Tag is just commit --base with no tree
changes.
C. Your experiment is great work and very comforting.
Something that is absolutely not a priority but that might help
shed some light at some point is an empirically supported
characterization of "typical" changerates and change-natures and
their determining variables, combined with analysis about how
effective "perfect" summaries (or any alternative) is predicted to
be. (For now, this not being rocket science, and especially given
checks such as yours, I trust my intuition filtered through
feedback from others thinking about the same topic.)
pyaba can be used to determine the path of a revision changeset, so it
may be helpful here.
$pyaba revision --patch
address@hidden/tlasrc--integration--1.3--patch-5
/mnt/eagerbeavershare/arch/storage/tlasrc/tlasrc/tlasrc--integration/tlasrc--integration--1.3/patch-5/tlasrc--integration--1.3--patch-5.patches.tar.gz
Oh, and remember how I wanted to be able to calculate specific deltas?
Even if we just needed the base-0 to patch-(2^x-1) revisions, we'd get this:
address@hidden:~$ du delta*.tar.gz -s --total -h
12K delta-base-0--patch-1.tar.gz
36K delta-base-0--patch-3.tar.gz
36K delta-base-0--patch-7.tar.gz
40K delta-base-0--patch-15.tar.gz
48K delta-base-0--patch-31.tar.gz
160K delta-base-0--patch-63.tar.gz
332K total
But if we had base-0 and wanted patch-63, we'd just need the last one.
Which is about half the size of the aggregate summary delta size. So
the arbitrary delta approach is more space-efficient than summary deltas.
Meahwhile, the aggregate size (according to du -s --total -h) of the
simple revisions from base-0 to patch-63 is 392 K. (using
--apparent-size, it's actually smaller, but I'm talking about storage
requirements)
Aaron
Re: [Gnu-arch-users] [BUG] FEATURE PLANS: "perfect" summary deltas, Aaron Bentley, 2004/07/10