gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] [PATCH] arch speedups on big trees


From: Chris Mason
Subject: Re: [Gnu-arch-users] [PATCH] arch speedups on big trees
Date: Wed, 28 Jan 2004 17:51:46 -0500

On Wed, 2004-01-28 at 17:09, Tom Lord wrote:

> Arch certainly has to stat all of the files in a project tree that
> it's inventorying.   It has to read all of the directories.   This is
> semantically mandatory.
> 
> But that should be pretty economical.  Even 15,000 stats, if the
> inodes are in the cache, and a few K hundred directory reads, if those
> are in the cache, should be pretty damn cheap.  

It might be if arch only did it once.  Things like trusting the inode
sig would help cut down on the number of times it gets done.  snappign
the inode sig is pretty expensive right now, so if it was kept up to
date in ram during common operations, and then written once at the end,
things would probably be much better.

> Even cold, it's
> decreasingly horrible over economic time.  There are a class of user
> who want to work on trees of that size on what is, by today's
> standards, pretty dinky hardware.  But that class as a % of users is
> going to do nothing but shrink rapidly over time.  Meanwhile, most
> people doing serious and sustained work on trees of that size, should
> be able to afford traversing their project trees even today.
>
Well, my benchmarks were on an amd64 machine with 1g of ram and 2 cpus. 
It wasn't the hardware making for 7 minute replay times on only 100
changesets ;-)

> Arch _somewhat_ needs to traverse revision library trees and
> pristines.  It depends on how much you trust those trees.  Traversing
> them allows the inode signature to be validated.  It'd be a little
> weird -- but I wouldn't object to an option that told tla "just trust
> the revision library (at your own, slight, risk)".  As an option,
> people could at least use that in a pretty reliable way.
> 
Makes sense.

> The cost of an inventory, though, _can_ be much more than just a
> stat'ing traversal.   If the inode cache is stale, an implicit tree
> will have to examine lots of source files (their contents, not their
> inodes).   _Currently_, and this is what I'm suggesting you fix,
> explicit ".id" files must be read for every (tag reading) inventory.
> 
Also makes sense.

> Inode signature files are pretty small.   It doesn't take much to read
> them.  You can reverse them and index them 7 different ways from
> tuesday in-core at low expense.   If you're convinced that inode sigs
> are all you need -- well, that's already done.
> 
If arch used the inode sigs alone without doing a stat on every file in
the tree, I think it would work well enough.  

-chris






reply via email to

[Prev in Thread] Current Thread [Next in Thread]