gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Getting revisions w/ HTTP protocol and time to comp


From: Jason McCarty
Subject: Re: [Gnu-arch-users] Getting revisions w/ HTTP protocol and time to complete
Date: Thu, 16 Oct 2003 18:08:10 -0400
User-agent: Mutt/1.5.4i

Edouard Gomez wrote:
> Hello,
> 
> I know there was a discussion about intermediate patche-(N)-(N+M)
> patches to speed up the "tla get" command w/ distant archives.
> 
> As i don't remember all the details,  and i don't find the thread in the
> archive  (this  last  month is  only  one  big  thread covering  lot  of
> topics... :-( ) I would like having some news about this topic.

The thread is here:
http://mail.gnu.org/archive/html/gnu-arch-users/2003-09/msg01419.html

I proposed (in a rough sense) one implementation of "summary deltas,"
but unfortunately I've been too busy/lazy to go ahead and write a
test-implementation to benchmark with. There is also some question as to
which algorithm will perform best in different circumstances.

> In my  case, having intermediate pacthes  would not really  be worth it,
> because patches  are really small,  and thus getting  the patch/applying
> the patch is not a costful operation. I get this for only 70 revisions:
> 
> $ time tla get address@hidden/xvidcore--devapi4--1.0
>  real 1m9.004s
>  user 0m5.632s
>  sys 0m3.024s

The time advantage of summary deltas can be pretty large, while the
bandwidth reduction is maybe 2x. I think the reason for this is that
patch application time is partly dependent on the number of files
modified (due to the inventory I would imagine). So if you have N
patches which modify an average of M files each, total time to apply
them is proportional to M*N. Since patches in a series often modify the
same files, a summary delta of those patches has the potential to modify
many fewer than M*N files, so that patch-application time is greatly
reduced.

On my computer generating tla--devo--1.1--patch-133 from base-0 takes
about 1m05s. With summary deltas taken every 200KB, patching takes only
about 4s. It's just another time/space tradeoff you can make, depending
on the density of the summaries.

> On this 1m9s  total, ~30s are spent going through  every revision dir to
> find a cached rev.

We'd have to see how well summaries would perform here. One feature is
that they can often reduce the number of directories which have to be
examined (that is, if you have a pristine or revlib entry sitting
around).

> In order to compare, getting it from CVS, i get:
> $ time cvs -z9 -d:pserver:address@hidden:/xvid co -r dev-api-4 xvidcore
> real 0m9.824s
> user 0m0.204s
> sys 0m0.141s

I think you could probably do as well with summaries, if you don't mind
wasting space on the server (it shouldn't waste bandwidth though).

> So as a first step, for  speeding up the get command, is there somewhere
> (an archive  :-) with  an implementation of  a kind of  "super" .listing
> file that  lists recursively all the  contents of revision  dirs for all
> category versions. This  way the cached rev searching  could be replaced
> by a simple file retrieval + string search.

This could certainly help, but I wonder how big the index would have to
get before the cost of downloading it outweighed the cost of an average
recursive search...

> This "super"  index would not fill the  gap between tla and  cvs, but it
> would make  tla feel  much responsive  as it would  find the  cached rev
> faster,  thus  beginning what  the  user  preceives  as the  "real"  get
> command, ie get patch levels and apply them.

Sure, it would nicely reduce the amount of time before tla mentions
finding a cachedrev while you're wondering, "is tla doing anything or
not?". You'd have to include a marker or something for continuations in
order to make it work though.

Jason




reply via email to

[Prev in Thread] Current Thread [Next in Thread]