gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] [PATCH] arch speedups on big trees


From: Tom Lord
Subject: Re: [Gnu-arch-users] [PATCH] arch speedups on big trees
Date: Fri, 19 Dec 2003 15:08:02 -0800 (PST)


    A>>> Actually, that suggests a variation that makes more sense:

    A>>> When we're trying to modify or delete the file with id X, we
    A>>> don't really want to know the path of every file in the tree
    A>>> - just the one with id X. So, start by asking "Does the file
    A>>> at path foo have id X?"

    A>>> and if it does, stop. If it doesn't, scan outwards until we
    A>>> find it (simplified form: scan the whole tree).

    A>>> That'll require a bit of refactoring of the inventory
    A>>> interface, I think, but shouldn't be too hard. It should work
    A>>> for any changeset application.

    A>>> Adding a file is the exceptional case - we have to scan the
    A>>> whole tree for that. But it doesn't happen very often in
    A>>> general, so that's probably a reasonable cutoff point for the
    A>>> optimisation.

    T>> My hunch is that you're proposing a fairly large amount of
    T>> code work and resulting complexity for, after the
    T>> inode-sig-implies-id hack being extended to handle explicit
    T>> tags, a relatively small return.

    T>> I'd be much happier just to see a first cut at mason's
    T>> optimization that only tries to handle the exact-patching
    T>> case.

    A> I'd have thought that was actually harder to implement. What's
    A> so difficult about just running the inventory scan on a single
    A> named file?

mason's hack _does_ run inventory on single files (and short lists of
single files).   What it _doesn't_ do is "search outward" for files.

To turn mason's hack into an optimization that only applies when we
know we're exact patching is a pretty straightforward exercise in
adding a new parameter to arch_apply_changeset and propogating that
upwards to callers.  (In and of itself, that parameter
addition/propogation is not enough to make mason's optimization
correct -- see below.)

Overall, what we're talking about is a general class of optimizations
in which apply_changeset doesn't just blindly ask for a whole-tree
inventory of the target tree, but rather looks first at the changeset
and tries to use that to minimize the number of system calls needed
for inventory purposes by exploiting properties of the changeset.

Such optimizations require a fairly heavy proof before we can trust
them: they must prove that the apply_changeset algorithm is invarient
under them.  You haven't offered such a proof for your variation and,
in fact, your proposal to specially consider only modified or deleted
files is flat out wrong.

So part of the rather major difficulty with your proposal is simply to
formulate it correctly.   I suspect that once you do that, you'll find
that the computation you need to perform on the changeset is not going
to be especially trivial to implement and review.

It's even worse than that, though, and this speaks to the issue of why
mason's optimization is not correct even if it were extended with the
extra parameter:

First, mason's proposal doesn't correctly handle .arch-inventory
files.  That's just flat-out wrong (but, admittedly, a
straightforwardly solvable problem.)

Second (I'm afraid this is a bit obscure): Over the years of its
existence, mkpatch has changed in some subtle ways.  The contents of
the file indexes of orig and mod trees have expanded, mostly to better
handle cases of inexact patching (e.g. merging).  One of the lessons
learned as those changes were made is that the apply_changeset
tree-delta algorithm is very forgiving -- it works quite well given
changesets before or after the mkpatch changes.  What isn't yet clear
to me is whether or not mason's optimization works correctly only with
recent mkpatch output and not earlier.  I'd like to make sure it works
with both not only to preserve upward compatability with old archives,
but also to keep an extra layer of redundency in there against future
changes to mkpatch.

So basically: your proposed "quick fix" to mason's hack is completely
losing and you'll have to do a fair amount of work to fix your quick
fix before it can even be considered.

Meanwhile, as your own profiling suggests, the bulk of the speed-up
from mason's optimization can be achieved by other, more obviously
correct means.

-t





reply via email to

[Prev in Thread] Current Thread [Next in Thread]