[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Emacs-diffs] master 064701d 2/3: Increase the obarray size.
From: |
Ken Raeburn |
Subject: |
Re: [Emacs-diffs] master 064701d 2/3: Increase the obarray size. |
Date: |
Sat, 31 Dec 2016 15:54:58 -0500 |
> On Dec 31, 2016, at 09:18, Stefan Monnier <address@hidden> wrote:
>
>> In a typical GNU/Linux/X11 build, we wind up with over 15k symbols by
>> the time we've started. The old obarray size ensured an average chain
>> length of 10 or more.
>> * src/lread.c (OBARRAY_SIZE): Increase to 15121.
>
> Have you checked the performance impact? E.g. compared to a length of 8K?
I got a slight improvement over the old setting when loading dumped.elc, not a
huge one, but I didn’t compare against a value in the 8k neighborhood.
I’m not sure what a better test might be. I suppose I could create a test case
with a list of strings matching the interned symbols found in dumped.elc in
content (depends on the configuration), number of repetitions (depends on the
#N# hack I recently pushed to the scratch/raeburn-startup branch), and ordering
(depends on the obarray size via mapatoms, of course!), and test *just*
interning those strings in a new obarray preloaded with the symbols created in
the Emacs C code, without the rest of the reader code. But I’m not sure that
symbol set is an optimal test; certainly, if we don’t go the big-elc route, the
timing for that specific set of intern operations won’t be something worth
optimizing for. And such a test strips out the cache contention and other
effects of reading bytes and checking for Unicode input and allocating cons
cells and so on, for better or worse.
> I do believe 15K is better than 1.5K, but an average length of 1 sound
> like the hash array might end up being larger than the optimal size.
It’s 1 if you look at all the slots, but about 37% of the slots should remain
empty (it’s a standard balls-in-bins question in probability), putting all the
symbols (and all the symbol lookups) in the other 63%, so the slots actually
used have an average length of about 1.6.
Of course, this ignores uneven distribution of the hash function over the
actual, real-world set of symbol names, and since the symbols aren’t equally
likely in the input, chain length is only a rough approximation for actual
run-time cost. We might be much more affected by where “nil” winds up in its
symbol chain than by where “describe-mode” winds up in its chain. Also, in the
macOS Nextstep build, it’s more like 21k symbols to start (and ~1/4 empty
slots, and nonempty-chain length 1.9 I think).
So, yeah, the initial utilization level is perhaps a bit low. In my testing
with the Linux “perf” tools, it appeared that examining symbols — actually,
reading the string size fields from memory — was where the CPU tended to stall
and waste most of its cycles, so I was aiming to keep the chains short, at a
bit of a cost in memory size and cache contention in the obarray itself. I’m
happy to run some more tests and look for a better size, if you think it’s
important.
Really, I think an obarray should be an opaque object able to automatically
resize itself (hash table?) or reorganize itself (tree?), and not pretend to be
sort of like a fixed-size array with some symbols visible and some symbols
hiding invisibly inside it, but it doesn’t seem crucial enough to performance
to actually do anything about it right now.
Ken