Paul Eggert <address@hidden> writes:
That being said, it might make sense for a few
obviously-rarely-called
functions like 'emacs-abort' to be marked with __attribute__
((cold)),
so long as we don't turn this into a mission to mark all cold
functions
(which would cost us more than it would benefit). That is what GCC
itself does, with its own functions. However, I'd like to see
performance figures. Could you try it out on the benchmark of 'cd
lisp
&& time make compile-always'?
Right, I agree that if used, they should be used sparingly. I tested
three versions a few times each with both 'make' and 'make -j4':
a) Regular Emacs master.
b) The below diff with only the _Cold attribute
c) The below diff with both _Cold and _Hot attributes
a) Normal
real 4:17.97s
user 3:57.18s
sys 20.394s
real 1:17.67s
user 4:23.78s
sys 18.888s
b) Cold
real 4:10.92s
user 3:50.34s
sys 20.178s
real 1:15.77s
user 4:16.73s
sys 18.943s
c) Hot/Cold
real 4:11.43s
user 3:51.07s
sys 19.961s
real 1:16.01s
user 4:17.63s
sys 18.662s
So not much of a difference. For some reason the Hot/Cold performed
consistently worse than Cold.
I also tested startup/shutdown with perf:
Performance counter stats for '../emacs-normal -f kill-emacs' (20
runs):
762.17 msec task-clock:u # 0.844 CPUs
utilized ( +- 0.23% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
12,941 page-faults:u # 0.017 M/sec
( +- 0.01% )
2,998,322,125 cycles:u # 3.934 GHz
( +- 0.06% )
1,392,869,413 stalled-cycles-frontend:u # 46.45% frontend
cycles idle ( +- 0.15% )
982,206,843 stalled-cycles-backend:u # 32.76% backend
cycles idle ( +- 0.18% )
4,874,186,825 instructions:u # 1.63 insn per
cycle
# 0.29 stalled
cycles per insn ( +- 0.01% )
1,037,929,374 branches:u # 1361.802 M/sec
( +- 0.01% )
17,930,471 branch-misses:u # 1.73% of all
branches ( +- 0.16% )
1,209,539,215 L1-dcache-loads:u # 1586.960 M/sec
( +- 0.01% )
42,346,229 L1-dcache-load-misses:u # 3.50% of all
L1-dcache hits ( +- 0.05% )
9,088,647 LLC-loads:u # 11.925 M/sec
( +- 0.29% )
<not supported> LLC-load-misses:u
0.90325 +- 0.00441 seconds time elapsed ( +- 0.49% )
Performance counter stats for '../emacs.cold -f kill-emacs' (20
runs):
755.94 msec task-clock:u # 0.845 CPUs
utilized ( +- 0.24% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
12,941 page-faults:u # 0.017 M/sec
( +- 0.01% )
2,976,036,365 cycles:u # 3.937 GHz
( +- 0.06% )
1,374,451,779 stalled-cycles-frontend:u # 46.18% frontend
cycles idle ( +- 0.14% )
990,227,732 stalled-cycles-backend:u # 33.27% backend
cycles idle ( +- 0.18% )
4,878,661,927 instructions:u # 1.64 insn per
cycle
# 0.28 stalled
cycles per insn ( +- 0.00% )
1,038,495,525 branches:u # 1373.782 M/sec
( +- 0.00% )
17,859,906 branch-misses:u # 1.72% of all
branches ( +- 0.16% )
1,209,345,531 L1-dcache-loads:u # 1599.792 M/sec
( +- 0.00% )
42,444,358 L1-dcache-load-misses:u # 3.51% of all
L1-dcache hits ( +- 0.06% )
9,204,368 LLC-loads:u # 12.176 M/sec
( +- 0.41% )
<not supported> LLC-load-misses:u
0.89430 +- 0.00217 seconds time elapsed ( +- 0.24% )
Performance counter stats for '../emacs.hot-cold -f kill-emacs' (20
runs):
761.97 msec task-clock:u # 0.845 CPUs
utilized ( +- 0.20% )
0 context-switches:u # 0.000 K/sec
0 cpu-migrations:u # 0.000 K/sec
12,947 page-faults:u # 0.017 M/sec
( +- 0.01% )
2,989,750,359 cycles:u # 3.924 GHz
( +- 0.04% )
1,383,312,275 stalled-cycles-frontend:u # 46.27% frontend
cycles idle ( +- 0.12% )
994,643,853 stalled-cycles-backend:u # 33.27% backend
cycles idle ( +- 0.13% )
4,879,318,990 instructions:u # 1.63 insn per
cycle
# 0.28 stalled
cycles per insn ( +- 0.00% )
1,038,584,045 branches:u # 1363.022 M/sec
( +- 0.00% )
17,863,736 branch-misses:u # 1.72% of all
branches ( +- 0.13% )
1,209,327,347 L1-dcache-loads:u # 1587.103 M/sec
( +- 0.00% )
42,501,374 L1-dcache-load-misses:u # 3.51% of all
L1-dcache hits ( +- 0.05% )
9,201,311 LLC-loads:u # 12.076 M/sec
( +- 0.28% )
<not supported> LLC-load-misses:u
0.90132 +- 0.00201 seconds time elapsed ( +- 0.22% )
Which again shows a slight improvement with the Cold attributes, and
still shows the hot attributes degrading performance. Perhaps I was
too
overzealous with the hot tagging?