bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] count-leading-zeros: new module


From: Ondřej Bílka
Subject: Re: [PATCH] count-leading-zeros: new module
Date: Sat, 11 Aug 2012 12:23:56 +0200
User-agent: Mutt/1.5.20 (2009-06-14)

On Sat, Aug 11, 2012 at 09:45:16AM +0200, Jim Meyering wrote:
> Eric Blake wrote:
> > I needed gcc's clz to determine the most significant bit of a
> > number (useful for things like truncating to a power of 2),
> > and was surprised it is not a standardized function (the
> > opposite direction of finding the least significant bit is
> ...
> > +/* Expand the code which computes the number of leading zeros of the local
> > +   variable 'x' of type TYPE (an unsigned integer type) and returns it
> > +   from the current function.  */
> > +#if 0 && __GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4)
> > +# define COUNT_LEADING_ZEROS(BUILTIN, TYPE)     \
> > +  return x ? BUILTIN (x) : CHAR_BIT * sizeof x;
> > +#else
> > +# define COUNT_LEADING_ZEROS(BUILTIN, TYPE)                             \
> > +  /* This condition is written so as to avoid shifting by more than     \
> > +     31 bits at once, and also avoids a random HP-UX cc bug.  */        \
> > +  verify (((TYPE) -1 >> 31 >> 31 >> 2) == 0); /* TYPE has at most 64 bits 
> > */ \
> > +  int count = 0;                                                        \
> > +  if (1 < (TYPE) -1 >> 31) { /* TYPE has more than 32 bits? */          \
> > +    count = count_leading_zeros_32 (x >> 31 >> 1);                      \
> > +    if (count < 32)                                                     \
> > +      return count;                                                     \
> > +  }                                                                     \
> > +  return count + count_leading_zeros_32 (x);
> > +
> > +/* Compute and return the number of leading zeros in the least
> > +   significant 32 bits of X. */
> > +static inline int
> > +count_leading_zeros_32 (unsigned int x)
> > +{
> > +  int count = 0;
> > +  if (x & 0xffff0000U)
> > +    x >>= 16;
> > +  else
> > +    count += 16;
> > +  if (x & 0xff00)
> > +    x >>= 8;
> > +  else
> > +    count += 8;
> > +  if (x & 0xf0)
> > +    x >>= 4;
> > +  else
> > +    count += 4;
> > +  if (x & 0xc)
> > +    x >>= 2;
> > +  else
> > +    count += 2;
> > +  if (x & 2)
> > +    x >>= 1;
> > +  else
> > +    count++;
> > +  if (!(x & 1))
> > +    count++;
> > +  return count;
> > +}
> > +#endif
> 
> Hi Eric,
> 
> Did you consider using a variant of the following?
> It looks like it would be more efficient.
> 
> [From http://graphics.stanford.edu/~seander/bithacks.html]
> 
>       Find the log base 2 of an N-bit integer in O(lg(N)) operations
>       with multiply and lookup
> 
>       uint32_t v; // find the log base 2 of 32-bit v
>       int r;      // result goes here
> 
>       static const int MultiplyDeBruijnBitPosition[32] =
>       {
>         0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
>         8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
>       };
> 
>       v |= v >> 1; // first round down[sic*] to one less than a power of 2
>       v |= v >> 2;
>       v |= v >> 4;
>       v |= v >> 8;
>       v |= v >> 16;
> 
>       r = MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> 27];
> 
>       The code above computes the log base 2 of a 32-bit integer with a
>       small table lookup and multiply. It requires only 13 operations,
>       compared to (up to) 20 for the previous method. The purely
>       table-based method requires the fewest operations, but this offers
>       a reasonable compromise between table size and speed.
> 
> [*] I've just reported the error in that comment: s/down/up/

Hi,
Currently fastest way is convert to float and extract exponent. This is
always log2(n) or log2(n)+1 which can be easily repaired.

Here is implementation:

#include <stdint.h>

inline int clz64(int64_t x){
  if (x<0) return 0;
  union convert {uint64_t i;double f;} un;
  un.f=(double) x;
  int64_t ret= (un.i >> (64-12)) - 1023;
  return 63-(ret - 1 + (x>>ret));
}

inline int clz32(int32_t x){
  if (x<0) return 0;
  union convert {uint32_t i;float f;} un;
  un.f=(float) x;
  int ret=(un.i >> (32-9)) - 127;
  return 31-(ret - 1 + (x>>ret));
}

inline int clz32_alt(uint32_t x){
  union convert {uint64_t i;double f;} un;
  un.f=(double) x;
  int ret =(un.i >> (64-12)) - 1023;
  return 31 - ret;
}


 gcc messes this code a bit, it generates following code:
  
cvtsi2sdq %rdi, %xmm0
movsd %xmm0, -8(%rsp)
movq  -8(%rsp), %rcx



-- 

the printer thinks its a router.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]