texmacs-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Texmacs-dev] Efficiency of C++ switch-statements


From: Gabriel Dos Reis
Subject: Re: [Texmacs-dev] Efficiency of C++ switch-statements
Date: 11 Nov 2003 03:34:30 +0100

"Dan Martens" <address@hidden> writes:

| --------- Original Message ---------
| 
| DATE: Mon, 10 Nov 2003 22:58:15
| From: Joris van der Hoeven <address@hidden>
| To: <address@hidden>
| Cc: 
| 
| >> Yes, it does use a table.
| >>
| 
| This is somewhat misleading. 

Yours is at least equally misleading.  See below.

| The compiler does not use a real "table" (ie, an array with
| entries), rather the binary search is implemented using compare
| instructions.  Ie. 

Not always.  The compiler (g++ or gcc) can use a table with entries or a
series of compares and jumps, that depends on the density of the labels.  
See  Message-ID: <address@hidden>

[...]

| So, no jump table is created, just a series of "smart" compare
| instructions to mimic a binary search and narrow decisions. This
| means that any search is O(log n). Much better than cascading if
| statements. 
| 
| I have made a very small test program:
| 
| #include <stdlib.h>
| 
| int main(){
| 
|   int test;
| 
|   test = rand();
| 
|   switch(test){
|   case 1: 
|   case 10: 
|   case 100: 
|   case 1000: 
|   case 10000: 
|   case 100000: 
|   case 1000000: test++; break;
|   case 2: 
|   case 20: 
|   case 200: 
|   case 2000: 
|   case 20000:
|   case 200000: 
|   case 2000000: test++; break;
|   case 3: 
|   case 30: 
|   case 300: 
|   case 3000: 
|   case 30000: 
|   case 300000: 
|   case 3000000: test++; break;
|   case 4: test++; break;
|   case 40: test++; break;
|   default: break;
|   }
| }

In your program, the low case is 1 and the highest is 3000000; you
have 24 labels.  And they aren't dense in the range 1..3000000.  So
according to the comment

                                                         A branch table
   is used if there are more than a few labels and the labels are  dense
   within the range between the smallest and largest case value.
 
GCC will not use a branch table.  Rather it will generate a series of
compares and jumps, following 

   The alternative to the use of a branch table is to generate a series
   of compare and jump insns.  When that is done, we use the LEFT, RIGHT,
   and PARENT fields to hold a binary tree.  Initially the tree is
   totally unbalanced, with everything on the right.  We balance the tree
   with nodes on the left having lower case values than the parent
   and nodes on the right having higher values.  We then output the tree
   in order.

which is what you're experimenting.

In short, your testcase does not testify that the compiler does not
generate real branch table tables.  It just concurs on the fact that
when the case labels are not dense then, the compiler sort the labels
and generate compares and jumps.

[...]

Now, consider the following:

     enum E {
       zero, one, two, three, four, five,
       six, seven, eight, nine, ten
     };

     extern E e;

     int main()
     {
       switch(e)
         {
         case zero:
           e = one;
           break;

         case one:
           e = zero;
           break;

         case two:
           e = ten;
           break;

         case three:
           e = zero;
           break;

         case four:
           break;

         case five:
           e = four;
           break;

         case six:
           e = two;
           break;

         case seven:
           e = one;
           break;

         case eight:
           e = nine;
             break;

         case nine:
         case ten:
           break;
         }
     }

On a Sparc target, here is the output (with some comments) I get

        .file   "t.C"
        .section        ".text"
        .align 4
        .global main
        .type   main, #function
        .proc   04
main:
.LLFB3:
        !#PROLOGUE# 0
        !#PROLOGUE# 1
        sethi   %hi(e), %o3
        ld      [%o3+%lo(e)], %g1
        cmp     %g1, 8          ! if e is greater than eight
        bgu     .LL2            ! we're done
        sll     %g1, 2, %g1     
        sethi   %hi(.LL14), %o5 ! else set the jump table .LL14
        or      %o5, %lo(.LL14), %o5    
        ld      [%o5+%g1], %o4  ! and load the corresponding entry
        jmp     %o4             ! jump to that address 
         nop
.LL5:
        mov     10, %g1
        b       .LL2
        st      %g1, [%o3+%lo(e)]
.LL10:
        mov     1, %g1
        b       .LL2
        st      %g1, [%o3+%lo(e)]
.LL6:
        b       .LL2
        st      %g0, [%o3+%lo(e)]
.LL11:
        mov     9, %g1
        b       .LL2
        st      %g1, [%o3+%lo(e)]
.LL9:
        mov     2, %g1
        b       .LL2
        st      %g1, [%o3+%lo(e)]
.LL8:
        mov     4, %g1
        st      %g1, [%o3+%lo(e)]
.LL2:
        retl
        mov     0, %o0
        .align 4
        .subsection     -1
        .align 4
.LL14:
        .word   .LL10           ! branch table starts here...
        .word   .LL6
        .word   .LL5
        .word   .LL6
        .word   .LL2
        .word   .LL8
        .word   .LL9
        .word   .LL10
        .word   .LL11           ! ...and ends here
        .previous
.LLFE3:
        .size   main, .-main
        .ident  "GCC: (GNU) 3.4 20030330 (experimental)"
 

As you can see, the compiler does generate actual lookup table with entries
that correspond to appropriate codes.

| >> > Another question: if I do not put the cases in the same order
| >> > as the enumeration, does this alter the efficiency?
| 
| No, and the above explains why.  If an array of jump instructions
| were used, then order would matter because the entries of the array
| would be incremental. 

Not really.  See above.

| >> Also consider using the STL hash_map object, which gives you O(1)
| >> lookups and does not waste space when your table is sparse.
| >
| 
| This is way more overhead than a switch statement.

I agree that a hash_map is an overkill approach to this kind of
things.  C++ provides for far better and more efficiency way to handle
this kind of things. 

|  Much more code
| is executed in the map object.  You have to remember that a switch
| just does assembly compare instructions.  No method calls are made
| (which are expensive), also the cmp instruction is fairly atomic and
| is not a CISC feature (on RISC also), so no microcode is used as is
| the mul and div instructions which are not atomic.  

But, a virtual function call may be cheaper than the series of
compares and jumps.  And in case the compiler would have used a branch
table, again the virtual function call may be cheaper.  

In the end there is a simple, efficient, much more maintainable
solution but nobody will use it because of persistent urban legend
;-). 

[...]

| As you can see above, holes in the cases make no difference.  A
| switch is by far the most efficent method to use, 

compared to a virtual function call, a switch is simpler onnly in a
very limited cases.

-- Gaby




reply via email to

[Prev in Thread] Current Thread [Next in Thread]