[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: strcache scaling issue

From: Ralf Wildenhues
Subject: Re: strcache scaling issue
Date: Sun, 23 Jan 2011 09:59:15 +0100
User-agent: Mutt/1.5.20 (2010-08-04)

Hi Paul,

* Paul Smith wrote on Sat, Jan 22, 2011 at 10:27:01PM CET:
> Hi Ralf, sorry for the delay in reply.

No worries.

> On Sun, 2011-01-09 at 08:50 +0100, Ralf Wildenhues wrote:
> > Just breaking out of the for loop in the if-true case avoids the problem
> > for me.  (The bulk of the patch is cleanup to remove the unneeded 'best'
> > variable.)  Of course that only delays quadratic behavior as with filled
> > caches, all of them are still walked.
> Thanks for your investigations here; that's very helpful.  I had also
> had some thoughts about improving the strcache in various ways.  For
> example, I was thinking maybe about splitting it into two: one for
> filenames and one for variable names, since these strings very rarely
> overlap.

But a hash doesn't degrade in quality just because you put two different
kinds of things in it.  I wouldn't go for more complexity here.  On the
contrary, if you want an example where one big hash table is used for
several different kinds of objects very successfully, look at git.

> One other thing that would probably be useful is that someone posted
> some patches that changed the hashing in GNU make to use a PATRICIA tree
> instead of a standard hash (the hashing in GNU make is taken from
> idutils); I think this could help a lot in the fairly common case where
> there are lots of strings which share prefixes.  I asked for copyright
> assignment for that code but I'm not sure where it stands.

Yes, this sounds like it might help.  FWIW, I would only go that way if
there are use cases where this demonstrably helps, and the log(N)
behavior doesn't slow others down a lot.

> But as always, it's better to test and fix real issues rather than guess
> as to where your issues might be.  I'll definitely look into making
> changes along the lines you suggest.

Thank you!

> > strcache_iscached is more worrisome with 14-fold increase.  Looking at
> > its implementation, I wonder why it doesn't do a hash lookup rather
> > than looping over all the cache entries.
> I was thinking that doing simple pointer compares across the buffers
> would be not much slower, and even faster in simple cases, than the hash
> lookup, but obviously when the # of buffers gets large that's not true.
> On the other hand, the strcache_iscached() function is really only
> intended for debug/verification.  It's called in an assert() (in file.c)
> I added during development of the string cache to be sure that strings
> expected to be in the cache, really were.  For normal use that
> could/should be disabled.  Perhaps I need a special flag controlling
> stuff like this.  I could add a compiler flag to enable these checks in
> maintenance mode, maybe.

Sounds like a good idea.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]