emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using __builtin_expect (likely/unlikely macros)


From: Alex Gramiak
Subject: Re: Using __builtin_expect (likely/unlikely macros)
Date: Tue, 16 Apr 2019 14:50:40 -0600
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux)

Paul Eggert <address@hidden> writes:

> That being said, it might make sense for a few obviously-rarely-called
> functions like 'emacs-abort' to be marked with __attribute__ ((cold)),
> so long as we don't turn this into a mission to mark all cold functions
> (which would cost us more than it would benefit). That is what GCC
> itself does, with its own functions. However, I'd like to see
> performance figures. Could you try it out on the benchmark of 'cd lisp
> && time make compile-always'?

Right, I agree that if used, they should be used sparingly. I tested
three versions a few times each with both 'make' and 'make -j4':

a) Regular Emacs master.
b) The below diff with only the _Cold attribute
c) The below diff with both _Cold and _Hot attributes

a) Normal
real    4:17.97s
user    3:57.18s
sys     20.394s

real    1:17.67s
user    4:23.78s
sys     18.888s

b) Cold
real    4:10.92s
user    3:50.34s
sys     20.178s

real    1:15.77s
user    4:16.73s
sys     18.943s

c) Hot/Cold
real    4:11.43s
user    3:51.07s
sys     19.961s

real    1:16.01s
user    4:17.63s
sys     18.662s

So not much of a difference. For some reason the Hot/Cold performed
consistently worse than Cold.

I also tested startup/shutdown with perf:

 Performance counter stats for '../emacs-normal -f kill-emacs' (20 runs):

            762.17 msec task-clock:u              #    0.844 CPUs utilized      
      ( +-  0.23% )
                 0      context-switches:u        #    0.000 K/sec              
    
                 0      cpu-migrations:u          #    0.000 K/sec              
    
            12,941      page-faults:u             #    0.017 M/sec              
      ( +-  0.01% )
     2,998,322,125      cycles:u                  #    3.934 GHz                
      ( +-  0.06% )
     1,392,869,413      stalled-cycles-frontend:u #   46.45% frontend cycles 
idle     ( +-  0.15% )
       982,206,843      stalled-cycles-backend:u  #   32.76% backend cycles 
idle      ( +-  0.18% )
     4,874,186,825      instructions:u            #    1.63  insn per cycle     
    
                                                  #    0.29  stalled cycles per 
insn  ( +-  0.01% )
     1,037,929,374      branches:u                # 1361.802 M/sec              
      ( +-  0.01% )
        17,930,471      branch-misses:u           #    1.73% of all branches    
      ( +-  0.16% )
     1,209,539,215      L1-dcache-loads:u         # 1586.960 M/sec              
      ( +-  0.01% )
        42,346,229      L1-dcache-load-misses:u   #    3.50% of all L1-dcache 
hits    ( +-  0.05% )
         9,088,647      LLC-loads:u               #   11.925 M/sec              
      ( +-  0.29% )
   <not supported>      LLC-load-misses:u                                       
    

           0.90325 +- 0.00441 seconds time elapsed  ( +-  0.49% )



 Performance counter stats for '../emacs.cold -f kill-emacs' (20 runs):

            755.94 msec task-clock:u              #    0.845 CPUs utilized      
      ( +-  0.24% )
                 0      context-switches:u        #    0.000 K/sec              
    
                 0      cpu-migrations:u          #    0.000 K/sec              
    
            12,941      page-faults:u             #    0.017 M/sec              
      ( +-  0.01% )
     2,976,036,365      cycles:u                  #    3.937 GHz                
      ( +-  0.06% )
     1,374,451,779      stalled-cycles-frontend:u #   46.18% frontend cycles 
idle     ( +-  0.14% )
       990,227,732      stalled-cycles-backend:u  #   33.27% backend cycles 
idle      ( +-  0.18% )
     4,878,661,927      instructions:u            #    1.64  insn per cycle     
    
                                                  #    0.28  stalled cycles per 
insn  ( +-  0.00% )
     1,038,495,525      branches:u                # 1373.782 M/sec              
      ( +-  0.00% )
        17,859,906      branch-misses:u           #    1.72% of all branches    
      ( +-  0.16% )
     1,209,345,531      L1-dcache-loads:u         # 1599.792 M/sec              
      ( +-  0.00% )
        42,444,358      L1-dcache-load-misses:u   #    3.51% of all L1-dcache 
hits    ( +-  0.06% )
         9,204,368      LLC-loads:u               #   12.176 M/sec              
      ( +-  0.41% )
   <not supported>      LLC-load-misses:u                                       
    

           0.89430 +- 0.00217 seconds time elapsed  ( +-  0.24% )


 Performance counter stats for '../emacs.hot-cold -f kill-emacs' (20 runs):

            761.97 msec task-clock:u              #    0.845 CPUs utilized      
      ( +-  0.20% )
                 0      context-switches:u        #    0.000 K/sec              
    
                 0      cpu-migrations:u          #    0.000 K/sec              
    
            12,947      page-faults:u             #    0.017 M/sec              
      ( +-  0.01% )
     2,989,750,359      cycles:u                  #    3.924 GHz                
      ( +-  0.04% )
     1,383,312,275      stalled-cycles-frontend:u #   46.27% frontend cycles 
idle     ( +-  0.12% )
       994,643,853      stalled-cycles-backend:u  #   33.27% backend cycles 
idle      ( +-  0.13% )
     4,879,318,990      instructions:u            #    1.63  insn per cycle     
    
                                                  #    0.28  stalled cycles per 
insn  ( +-  0.00% )
     1,038,584,045      branches:u                # 1363.022 M/sec              
      ( +-  0.00% )
        17,863,736      branch-misses:u           #    1.72% of all branches    
      ( +-  0.13% )
     1,209,327,347      L1-dcache-loads:u         # 1587.103 M/sec              
      ( +-  0.00% )
        42,501,374      L1-dcache-load-misses:u   #    3.51% of all L1-dcache 
hits    ( +-  0.05% )
         9,201,311      LLC-loads:u               #   12.076 M/sec              
      ( +-  0.28% )
   <not supported>      LLC-load-misses:u                                       
    

           0.90132 +- 0.00201 seconds time elapsed  ( +-  0.22% )


Which again shows a slight improvement with the Cold attributes, and
still shows the hot attributes degrading performance. Perhaps I was too
overzealous with the hot tagging?

Attachment: hot-cold.diff
Description: hot/cold


reply via email to

[Prev in Thread] Current Thread [Next in Thread]