gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Re: [PATCH] arch speedups on big trees


From: Chris Mason
Subject: [Gnu-arch-users] Re: [PATCH] arch speedups on big trees
Date: Sun, 11 Jan 2004 14:37:24 -0500

On Sat, 2004-01-10 at 04:17, Miles Bader wrote:
> On Fri, Jan 09, 2004 at 05:16:13PM -0500, Chris Mason wrote:
> > > You can't rely on your DB to catch any conflicts if you're running at a
> > > point where the user could have changed the tree, so it seems that you've
> > > _got_ to at least do a single full inventory at the start of a given
> > > user-level tla command.
> > 
> > See above, all tla commands used to change the tree also update the
> > mapping.
> 
> Um, that doesn't work for taglines (I know, you said you were only dealing
> with explicit tags right now, but I sort of thought it was just a few missing
> implementaion details, not a fundamental incompatibility).

Well, I just pretended taglines didn't exist.  My plan for compatibility
with them was to make the reverse mapping optional ;-)

> 
> It also fails in the case of direct manipulation by users, whether
> intentional -- e.g. a user `mv's or `rm's a directory, something which is
> advertised to work even for explicit tags -- or by accidentally modifying a
> .arch-ids directory.  Certainly any id-tag mapping resulting from the DB
> needs to be verified before use (just like if you try to use the mapping you
> see in a changeset).
> 
You don't usually get speed without losing something, in this case it's
flexibility.  The best solution might be to have tree-lint check the
reverse mapping and allow the user to configure how often the tree lint
is done.  Clearly any time the whole inventory is done, the reverse
mapping could be verified and/or updated for smallish cost.

[ snip ]

> But I think any solution should work well for both taglines and expicit tags
> to the extent it can.
> 
> Just trying to organize my thoughts here:
> 
>   (1) For a tagline tree, it will have to do a full-tree inventory to get all
>       the taglines, so it doesn't seem to make sense use the on-disk DB in
>       this case.  However there's no problem with taglines being represented
>       an in-core version of the DB, so the explicit and tagline cases share
>       all the code after the initial file-in-the-DB step; if the tagging
>       method is `explicit', just read the DB from disk into core, and if the
>       tagging method is `tagline', do a full-tree inventory to fill in the
>       (in-core only) DB.
> 
Nods.  It makes sense to have the reverse mapping optional, both for the
tagline case and the case where people with smaller trees want more
flexibility when using explicit ids.

>   (2) I note that one problem with merging inode-sigs and your `DB' is that
>       inode-sigs are tied to (past) revisions, whereas your DB is tied to the
>       project tree.  Together with the fact that you _do_ want to keep inode
>       sig info even for tagline trees (whereas you don't want the `project
>       tree' DB on disk in this case), maybe merging the two concepts isn't
>       such a hot idea after all.
>
>       _However_, the inode-sigs could be useful when actually building the
>       in-core DB for taglines from a full-tree inventory: if the inode
>       information of a file is up-to-date (with respect to whatever
>       inode-signature info you happen to have lying around -- note that it
>       could be _serveral_ old inode-sigs files), then you can get the file's
>       tag without reading the actual file data, something which could be an
>       important optimization.
> 
I haven't put enough thought into the inode sig merge, my intuition says
we can safely combine the two ideas into a useful optimization, even for
the tagline case.  For taglines, it would be more of an optimized inode
sig database, and the reverse mapping might be empty.
 
>   (3) Every entry in the in-core DB has a `verified' flag, which is set to
>       false for explicit trees, and to true for tagline trees (since a
>       full-tree inventory was done to fill in the in-core DB).
> 
>   (4) When you need a id-tag <-> pathname mapping, you can look in the in-core
>       DB; if the verified flag is false, you gotta go actually look at the
>       disk to check things out, and if OK, you can use it (and set the flag
>       to true).  Otherwise I guess you have to toss the DB and do a full-tree
>       inventory make a new one (fully verified this time).

Or just update the records that are incorrect/out of date.

>   (5) All operations update the DB, though for a tagline tree, the changes
>       are kept strictly within the in-core version.
> 
>   (6) Might the resulting up-to-date DB be useful for producing a new
>       inode-sig file at the end of the tla run?  [thus making
>       replay/update/etc update inode-sigs, which would be very useful]
> 
I think so, but my goal would be to never make a new inode-sig, just
update the changed records in the old one.

> Hmmm...
> 
> The main difference of what you've done from the above seems to be the lack
> of tagline support, and no support for verification (I've so far not looked
> at your code though).
> 

Correct, my code ignores taglines completely, and has no support for
verification.  I think the verification should include checksums of the
file, and be tied into the tree signing stuff, which I haven't thought
about much at all.  My primary goal here is just to convince people that
arch can be much faster on big trees, and generate discussions on the
best ways to do it.

> [Of course there are my complaints about the on-disk DB format, but yeah, I
> suppose those are orthogonal.]

;-)

-chris






reply via email to

[Prev in Thread] Current Thread [Next in Thread]