[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: strcache scaling issue
Re: strcache scaling issue
Sat, 22 Jan 2011 16:27:01 -0500
Hi Ralf, sorry for the delay in reply. As always your emails are
information-packed and it takes me a while to find the time to read them
with the appropriate amount of attention (I can whip off a response to
your typical request-for-help in just a few minutes, in contrast :-)
On Sun, 2011-01-09 at 08:50 +0100, Ralf Wildenhues wrote:
> Just breaking out of the for loop in the if-true case avoids the problem
> for me. (The bulk of the patch is cleanup to remove the unneeded 'best'
> variable.) Of course that only delays quadratic behavior as with filled
> caches, all of them are still walked.
Thanks for your investigations here; that's very helpful. I had also
had some thoughts about improving the strcache in various ways. For
example, I was thinking maybe about splitting it into two: one for
filenames and one for variable names, since these strings very rarely
overlap. Another thing that would be nice would be to combine the hash
tables so we don't have to have a filename table hashed on the filename
string, and a separate hash table just for the string cache. Ditto for
variable names (it's even more obvious for variable names since there's
only one "type"; for filenames it's a little trickier since not all
filenames have a struct filedef associated with them).
One other thing that would probably be useful is that someone posted
some patches that changed the hashing in GNU make to use a PATRICIA tree
instead of a standard hash (the hashing in GNU make is taken from
idutils); I think this could help a lot in the fairly common case where
there are lots of strings which share prefixes. I asked for copyright
assignment for that code but I'm not sure where it stands.
But as always, it's better to test and fix real issues rather than guess
as to where your issues might be. I'll definitely look into making
changes along the lines you suggest.
> strcache_iscached is more worrisome with 14-fold increase. Looking at
> its implementation, I wonder why it doesn't do a hash lookup rather
> than looping over all the cache entries.
I was thinking that doing simple pointer compares across the buffers
would be not much slower, and even faster in simple cases, than the hash
lookup, but obviously when the # of buffers gets large that's not true.
On the other hand, the strcache_iscached() function is really only
intended for debug/verification. It's called in an assert() (in file.c)
I added during development of the string cache to be sure that strings
expected to be in the cache, really were. For normal use that
could/should be disabled. Perhaps I need a special flag controlling
stuff like this. I could add a compiler flag to enable these checks in
maintenance mode, maybe.
Paul D. Smith <address@hidden> Find some GNU make tips at:
"Please remain calm...I may be mad, but I am a professional." --Mad Scientist