bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug in xtime.h


From: Bruno Haible
Subject: Re: bug in xtime.h
Date: Tue, 24 Dec 2019 10:34:07 +0100
User-agent: KMail/5.1.3 (Linux/4.4.0-169-generic; KDE/5.18.0; x86_64; ; )

Paul Eggert wrote:
> the remaining patches are for other instances of this idiom in Gnulib.

These other instances use 'int', not 'long long'. The machine code is similar:

===============================================================================
int sec1 (int t)
{ return (t < 0 ? (t + 1) / 10000 - 1 : t / 10000); }

int sec2 (int t)
{ return t / 10000 - (t % 10000 < 0); }

int sec3 (int t)
{ return (t + (t < 0)) / 10000 - (t < 0); }
===============================================================================

produces (with gcc-9.2.0)

sec1:
        testl   %edi, %edi
        js      .L5
        movl    %edi, %eax
        movl    $1759218605, %edx
        sarl    $31, %edi
        imull   %edx
        movl    %edx, %eax
        sarl    $12, %eax
        subl    %edi, %eax
        ret
.L5:
        addl    $1, %edi
        movl    $1759218605, %edx
        movl    %edi, %eax
        sarl    $31, %edi
        imull   %edx
        sarl    $12, %edx
        subl    %edi, %edx
        leal    -1(%rdx), %eax
        ret

sec2:
        movl    %edi, %eax
        movl    $1759218605, %edx
        imull   %edx
        movl    %edx, %eax
        movl    %edi, %edx
        sarl    $31, %edx
        sarl    $12, %eax
        subl    %edx, %eax
        imull   $10000, %eax, %edx
        subl    %edx, %edi
        shrl    $31, %edi
        subl    %edi, %eax
        ret
        .cfi_endproc

sec3:
        movl    %edi, %ecx
        movl    $1759218605, %edx
        shrl    $31, %ecx
        addl    %ecx, %edi
        movl    %edi, %eax
        sarl    $31, %edi
        imull   %edx
        movl    %edx, %eax
        sarl    $12, %eax
        subl    %edi, %eax
        subl    %ecx, %eax
        ret

And the benchmark:
===============================================================================
#include <stdlib.h>

static inline int sec1 (int t)
{ return (t < 0 ? (t + 1) / 1000 - 1 : t / 1000); }

static inline int sec2 (int t)
{ return t / 1000 - (t % 1000 < 0); }

static inline int sec3 (int t)
{ return (t + (t < 0)) / 1000 - (t < 0); }

volatile int t = 347913194;
volatile int x;

int
main (int argc, char *argv[])
{
  int repeat = atoi (argv[1]);
  int i;

  for (i = repeat; i > 0; i--)
    x = sec1 (t); // or sec2 (t) or sec3 (t)
}
===============================================================================

On an Intel Core m3 CPU:

                 gcc             clang

sec1           1.25 ns          1.14 ns
sec2           1.78 ns          1.63 ns
sec3           1.68 ns          1.73 ns

And on sparc64:

                 gcc

sec1           7.24 ns
sec2           7.51 ns
sec3           7.24 ns

And on aarch64:

                 gcc

sec1           3.54 ns
sec2           5.00 ns
sec3           4.59 ns

Interesting observations here:
* While on x86_64 and sparc64 the 32-bit division takes
  approximately as much time as the 64-bit division, on
  aarch64 it is 6 to 11 times faster!
* On x86_64, clang optimizes sec2 better than sec3.
  That's a bit paradoxical, because sec2 has an imulq and an imull
  instruction, whereas sec3 has only an imulq instruction.

Regarding your fourth patch:

>              - (corr_quad + (corr_quad < 0)) / 25 - (corr_quad < 0)

Shouldn't that be parenthesized differently?

               - ((corr_quad + (corr_quad < 0)) / 25 - (corr_quad < 0))

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]