gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] [PATCH] arch speedups on big trees


From: Chris Mason
Subject: Re: [Gnu-arch-users] [PATCH] arch speedups on big trees
Date: Fri, 19 Dec 2003 14:32:44 -0500

On Fri, 2003-12-19 at 14:08, Tom Lord wrote:
>     > From: Chris Mason <address@hidden>
> 
>     > I've been playing around with a few ideas to improve arch performance on
>     > large source trees, mostly in the area of applying changesets, and
>     > creating changesets.  I've got a sample archive here with 100 changesets
>     > on top of the linux 2.6 kernel, and vanilla arch takes  a number of
>     > minutes to apply them all (15-30 seconds per changeset via tla replay)
> 
>     > This is primarily because arch is doing an inventory of the source tree
>     > before each changeset, my patch changes things to inventory only the
>     > files touched by the changeset instead.  It sends a table of the
>     > candidate files to the inventory funcs, and this brings the time to
>     > replay my 100 changesets to ~4 seconds.
> 
> Holy crap!  Really?

Yeah, those are official numbers from "time tla replay" ;-)  It's a fast
machine though....before we spend lots of time fixing it we should make
sure it helps more than just me.

> 
>     > This is lightly tested (make test and a few others), 
> 
> This is the kind of change that needs to be made very carefully.
> 
> A quick scan of the technique you used suggests to me that it is
> certainly not correct.  In particular, it will not work properly for
> _merges_ even though it is either right or close-to-right for _exact_
> patching (such as when building a revision).  Note that `replay'
> (normally) counts as a merge command -- only when it is invoked from
> `update' would we believe a priori that it is doing exact patching.
> 
Sorry, sounds like I need to understand arch merging better.  How is it
different?

> So there's four things here:
> 
> 1) needs more testing
>
> 2) minimally, the optimization needs to be only sometimes used -- only
>    when we know that a changeset is being applied exactly rather than
>    as part of a merge
> 
> 3) maximally, _perhaps_ its worth trying to think about how to
>    generalize the hack to handle inexact patching.   I'm not so 
>    sure it is though -- you can just use `update' rather than `replay' 
>    and meanwhile, even without generalization, the hack can speed up
>    (dramatically, apparently) `get' and other cases of building a 
>    revision from changesets

Makes sense.  I'd like to see it handle merging right, mostly because I
do a lot of merging between various trees.

> 
> 4) since this appears to be a huge win, performance-wise, it might 
>    be interesting to take a slightly different approach that would
>    be harder to code up but would get inexact patching right:
> 
>    Instead of trying to infer "what files to inventory" from the 
>    contents of the changeset, an alternative is to do a full inventory
>    once, for the first changeset, but then keep track of what parts
>    of the filesystem are being changed along the way.

I almost tried this from a slightly different angle.  We actually
already have a coherent cache of how each tree looks at any given time,
it's called the filesystem ;-)  Arch isn't using that cache very well
because it doesn't limit queries to only the info it needs to know.

Instead of changing the inventory funcs to inventory a smaller list of
files, I could have changed arch_apply_changeset to check the filesystem
everywhere it currently reads through the inventory tables.  The end
result would not have called arch_changeset_inventory at all, but it
would have been a larger (more difficult) patch.

-chris






reply via email to

[Prev in Thread] Current Thread [Next in Thread]