Re: [Gnu-arch-users] [PATCH] arch speedups on big trees

gnu-arch-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] [PATCH] arch speedups on big trees

From:	Chris Mason
Subject:	Re: [Gnu-arch-users] [PATCH] arch speedups on big trees
Date:	Wed, 28 Jan 2004 12:34:09 -0500

On Wed, 2004-01-28 at 11:42, Tom Lord wrote:

[ maintain a reverse mapping for ids to file names ]

> I don't see how such a mapping can possibly work.  That is, the step
> that says "maintain a reverse mapping [...]" sounds to me like "use
> magic."
> 
;-) for explicit ids, it's not hard at all.

> The format of the mapping, and whether it is forwards or backwards,
> doesn't matter at all.   If you can maintain that mapping accurately,
> that is the same thing as being able to maintain a complete inventory
> (in any format).   You might as well just say that {arch}/++inventory
> will always contain the current tree inventory.
> 
Effectively, it does, just in a format where you don't have to read the
entire database to find out about one file or id.  Finding the id for a
file or the file for an id should be an indexed operation.

> How can you possibly maintain that inventory, though?   For example,
> if I edit a file and add a tagline, the inventory is out of date.   If
> I delete, rename, or create a file without using `tla add' and similar
> commands, then the inventory is out of date.
> 
Speed usually costs flexibility.  If you want it fast, use the proper
commands and don't use taglines.  For explicit ids, tree lint already
catches files without ids and other such things, so there are already
rules in place.  My code would have to be redone to have tagline based
trees do full inventories all the time.

> There may be some _minor_ advantage to caching and maintaining an
> inventory _internally_ in some circumstances (for example, to carry
> over from one changeset application to the next when replaying several
> in a row), but even that will be so hard to get right that I question
> its utility.  (What happens if, for example, someone uses an extended
> version of patch that can change the tree in ways tla doesn't know
> about?)
> 

Not sure what you mean here, tla changes and tla commit (without
--file-list) still do full inventories in my code, so any strange patch
program will work fine.  Applying revisions inside of arch uses the
known changeset format, so we know which files and ids need to be
inventoried.

> 
>     > 2) add --link and --replace for tla add-pristine.  Having a hard linked
>     > pristine tree makes commits faster, since the commit updates the
>     > pristine tree as the last step.  The replace option lets you update an
>     > existing pristine tree to a higher patch level without having to
>     > inventory it again, it can make a big difference during star-merge.
> 
>     > I seem to remember a post where you talked about pristine trees being
>     > dead, in my mind they are basically a private library.  It might be a
>     > good idea later on to generalize them as such.
> 
> Pristine trees already have an inventory cache and are reused
> implicitly in some circumstances.   If the logic of those features
> isn't working for some case, that should be fixed, but I don't think
> that new options to add-pristine.
> 
They are working fine, --link just makes pristine trees hard linked to
the same library as the source is, which makes comparisons much faster. 
--replace allows you to explicitly request revisions are applied on top
of an existing pristine tree instead of a new one being created (helps a
lot during star-merge).

> Generalizing/replacing pristines to make them more literally a
> tree-specific revision library strikes me as a much more practical
> idea.   What do you think of this idea:
> 
> 1) Permit a "special" element in library paths that means "the library
>    in my current project tree".    Tla can create the library
>    directory on demand (whenever it asks for the library path from
>    within a given project tree).
> 
> 2) By default, such libraries should be greedy and sparse.
> 
> 3) It might be worth considering an option to make libraries "sliding"
>    which means that new trees are formed from old by re-use rather
>    than by linking.   This would be tricky to get right and not safe
>    for concurrent use.   Perhaps it could be combined with a locking
>    protocol for library revisions.
> 
#3 makes a huge difference for performance.  Once you can apply a
revision to a tree without a full inventory, it becomes much less
expensive than creating a whole new hardlinked library, #1 and #2 make
sense.

> 
>     > 3) Avoid inode signatures for everything except library revisions. 
>     > Since taking an inode signature involves a whole tree inventory, we
>     > should only take them when we know we're going to read them at least
>     > twice before snapping them again.  Otherwise, the inode sig is a net
>     > loss in speed
> 
> I use what-changed pretty frequently.   I _think_ I read my
> non-library inode signatures more than twice, on average.

You talked me into inode sigs earlier, we just need a way to update them
after applying a revision that doesn't involve a whole tree inventory,
or reading/writing the entire inode sig file just to read/write one id.

-chris

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Gnu-arch-users] Re: [PATCH] arch speedups on big trees, (continued)

Prev by Date: Re: [Gnu-arch-users] names -> tagline method transition
Next by Date: Re: [Gnu-arch-users] larger trees slowing down
Previous by thread: Re: [Gnu-arch-users] [PATCH] arch speedups on big trees
Next by thread: Re: [Gnu-arch-users] [PATCH] arch speedups on big trees
Index(es):
- Date
- Thread