bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Optimize three-valued comparison between integers


From: Florian Weimer
Subject: Re: Optimize three-valued comparison between integers
Date: Fri, 24 Jul 2020 08:46:10 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)

* Bruno Haible:

> A comment in Luca Saiu 'jitter' project [1] brought my attention to 3-valued
> comparison and the book "Hacker's Delight".
>
> ===============================================
> int sign1 (long n1, long n2)
> {
>   return (n1 > n2 ? 1 : n1 < n2 ? -1 : 0);
> }
>
> int sign2 (long n1, long n2)
> {
>   return (n1 < n2 ? -1 : n1 > n2);
> }
>
> int sign3 (long n1, long n2)
> {
>   return (n1 > n2) - (n1 < n2);
> }
> ===============================================
>
> Which of these, do you think, generates better code? The important fact,
> nowadays, is that conditional jumps cause the CPU to stall when it has
> mispredicted the jump. So, 1 or 2 more or less ALU instructions are
> irrelevant, but conditional jumps are not.
>
> You find conditional jumps in code (x86) produced by GCC versions
>
> sign1: 4.1.2 4.2.4 4.3.6 4.4.7 4.5.4 4.6.4 4.7.3 4.8.5 4.9.4 5.5.0 7.5.0 
> 8.4.0 9.3.0 10.1.0
> sign2: 4.1.2 4.2.4 4.3.6 4.4.7 7.5.0 8.4.0 9.3.0
> sign3: 
>
> So, the third variant comes out best.

On aarch64, sign1 and sign2 are one instruction shorter, and do not
contain branches, either.  On s390x, all three variants use conditional
branches, but the first one is the shortest.

There is also this:

int sign4 (long n1, long n2)
{
  return (char) ((n1 > n2) - (n1 < n2));
}

It's one instruction shorter on x86-64 than sign3, but it's worse on
other architectures.  (One of the x86-64 quirks is that conditional sets
do not affect the full register, only a byte.)

Thanks,
Florian




reply via email to

[Prev in Thread] Current Thread [Next in Thread]