[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnu-arch-users] Getting revisions w/ HTTP protocol and time to comp
From: |
Jason McCarty |
Subject: |
Re: [Gnu-arch-users] Getting revisions w/ HTTP protocol and time to complete |
Date: |
Thu, 16 Oct 2003 18:08:10 -0400 |
User-agent: |
Mutt/1.5.4i |
Edouard Gomez wrote:
> Hello,
>
> I know there was a discussion about intermediate patche-(N)-(N+M)
> patches to speed up the "tla get" command w/ distant archives.
>
> As i don't remember all the details, and i don't find the thread in the
> archive (this last month is only one big thread covering lot of
> topics... :-( ) I would like having some news about this topic.
The thread is here:
http://mail.gnu.org/archive/html/gnu-arch-users/2003-09/msg01419.html
I proposed (in a rough sense) one implementation of "summary deltas,"
but unfortunately I've been too busy/lazy to go ahead and write a
test-implementation to benchmark with. There is also some question as to
which algorithm will perform best in different circumstances.
> In my case, having intermediate pacthes would not really be worth it,
> because patches are really small, and thus getting the patch/applying
> the patch is not a costful operation. I get this for only 70 revisions:
>
> $ time tla get address@hidden/xvidcore--devapi4--1.0
> real 1m9.004s
> user 0m5.632s
> sys 0m3.024s
The time advantage of summary deltas can be pretty large, while the
bandwidth reduction is maybe 2x. I think the reason for this is that
patch application time is partly dependent on the number of files
modified (due to the inventory I would imagine). So if you have N
patches which modify an average of M files each, total time to apply
them is proportional to M*N. Since patches in a series often modify the
same files, a summary delta of those patches has the potential to modify
many fewer than M*N files, so that patch-application time is greatly
reduced.
On my computer generating tla--devo--1.1--patch-133 from base-0 takes
about 1m05s. With summary deltas taken every 200KB, patching takes only
about 4s. It's just another time/space tradeoff you can make, depending
on the density of the summaries.
> On this 1m9s total, ~30s are spent going through every revision dir to
> find a cached rev.
We'd have to see how well summaries would perform here. One feature is
that they can often reduce the number of directories which have to be
examined (that is, if you have a pristine or revlib entry sitting
around).
> In order to compare, getting it from CVS, i get:
> $ time cvs -z9 -d:pserver:address@hidden:/xvid co -r dev-api-4 xvidcore
> real 0m9.824s
> user 0m0.204s
> sys 0m0.141s
I think you could probably do as well with summaries, if you don't mind
wasting space on the server (it shouldn't waste bandwidth though).
> So as a first step, for speeding up the get command, is there somewhere
> (an archive :-) with an implementation of a kind of "super" .listing
> file that lists recursively all the contents of revision dirs for all
> category versions. This way the cached rev searching could be replaced
> by a simple file retrieval + string search.
This could certainly help, but I wonder how big the index would have to
get before the cost of downloading it outweighed the cost of an average
recursive search...
> This "super" index would not fill the gap between tla and cvs, but it
> would make tla feel much responsive as it would find the cached rev
> faster, thus beginning what the user preceives as the "real" get
> command, ie get patch levels and apply them.
Sure, it would nicely reduce the amount of time before tla mentions
finding a cachedrev while you're wondering, "is tla doing anything or
not?". You'd have to include a marker or something for continuations in
order to make it work though.
Jason