[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: strcache scaling issue
From: |
Ralf Wildenhues |
Subject: |
Re: strcache scaling issue |
Date: |
Sun, 23 Jan 2011 09:59:15 +0100 |
User-agent: |
Mutt/1.5.20 (2010-08-04) |
Hi Paul,
* Paul Smith wrote on Sat, Jan 22, 2011 at 10:27:01PM CET:
> Hi Ralf, sorry for the delay in reply.
No worries.
> On Sun, 2011-01-09 at 08:50 +0100, Ralf Wildenhues wrote:
> > Just breaking out of the for loop in the if-true case avoids the problem
> > for me. (The bulk of the patch is cleanup to remove the unneeded 'best'
> > variable.) Of course that only delays quadratic behavior as with filled
> > caches, all of them are still walked.
>
> Thanks for your investigations here; that's very helpful. I had also
> had some thoughts about improving the strcache in various ways. For
> example, I was thinking maybe about splitting it into two: one for
> filenames and one for variable names, since these strings very rarely
> overlap.
But a hash doesn't degrade in quality just because you put two different
kinds of things in it. I wouldn't go for more complexity here. On the
contrary, if you want an example where one big hash table is used for
several different kinds of objects very successfully, look at git.
> One other thing that would probably be useful is that someone posted
> some patches that changed the hashing in GNU make to use a PATRICIA tree
> instead of a standard hash (the hashing in GNU make is taken from
> idutils); I think this could help a lot in the fairly common case where
> there are lots of strings which share prefixes. I asked for copyright
> assignment for that code but I'm not sure where it stands.
Yes, this sounds like it might help. FWIW, I would only go that way if
there are use cases where this demonstrably helps, and the log(N)
behavior doesn't slow others down a lot.
> But as always, it's better to test and fix real issues rather than guess
> as to where your issues might be. I'll definitely look into making
> changes along the lines you suggest.
Thank you!
> > strcache_iscached is more worrisome with 14-fold increase. Looking at
> > its implementation, I wonder why it doesn't do a hash lookup rather
> > than looping over all the cache entries.
>
> I was thinking that doing simple pointer compares across the buffers
> would be not much slower, and even faster in simple cases, than the hash
> lookup, but obviously when the # of buffers gets large that's not true.
>
> On the other hand, the strcache_iscached() function is really only
> intended for debug/verification. It's called in an assert() (in file.c)
> I added during development of the string cache to be sure that strings
> expected to be in the cache, really were. For normal use that
> could/should be disabled. Perhaps I need a special flag controlling
> stuff like this. I could add a compiler flag to enable these checks in
> maintenance mode, maybe.
Sounds like a good idea.
Cheers,
Ralf