[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bug in xtime.h
From: |
Bruno Haible |
Subject: |
Re: bug in xtime.h |
Date: |
Tue, 24 Dec 2019 10:34:07 +0100 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-169-generic; KDE/5.18.0; x86_64; ; ) |
Paul Eggert wrote:
> the remaining patches are for other instances of this idiom in Gnulib.
These other instances use 'int', not 'long long'. The machine code is similar:
===============================================================================
int sec1 (int t)
{ return (t < 0 ? (t + 1) / 10000 - 1 : t / 10000); }
int sec2 (int t)
{ return t / 10000 - (t % 10000 < 0); }
int sec3 (int t)
{ return (t + (t < 0)) / 10000 - (t < 0); }
===============================================================================
produces (with gcc-9.2.0)
sec1:
testl %edi, %edi
js .L5
movl %edi, %eax
movl $1759218605, %edx
sarl $31, %edi
imull %edx
movl %edx, %eax
sarl $12, %eax
subl %edi, %eax
ret
.L5:
addl $1, %edi
movl $1759218605, %edx
movl %edi, %eax
sarl $31, %edi
imull %edx
sarl $12, %edx
subl %edi, %edx
leal -1(%rdx), %eax
ret
sec2:
movl %edi, %eax
movl $1759218605, %edx
imull %edx
movl %edx, %eax
movl %edi, %edx
sarl $31, %edx
sarl $12, %eax
subl %edx, %eax
imull $10000, %eax, %edx
subl %edx, %edi
shrl $31, %edi
subl %edi, %eax
ret
.cfi_endproc
sec3:
movl %edi, %ecx
movl $1759218605, %edx
shrl $31, %ecx
addl %ecx, %edi
movl %edi, %eax
sarl $31, %edi
imull %edx
movl %edx, %eax
sarl $12, %eax
subl %edi, %eax
subl %ecx, %eax
ret
And the benchmark:
===============================================================================
#include <stdlib.h>
static inline int sec1 (int t)
{ return (t < 0 ? (t + 1) / 1000 - 1 : t / 1000); }
static inline int sec2 (int t)
{ return t / 1000 - (t % 1000 < 0); }
static inline int sec3 (int t)
{ return (t + (t < 0)) / 1000 - (t < 0); }
volatile int t = 347913194;
volatile int x;
int
main (int argc, char *argv[])
{
int repeat = atoi (argv[1]);
int i;
for (i = repeat; i > 0; i--)
x = sec1 (t); // or sec2 (t) or sec3 (t)
}
===============================================================================
On an Intel Core m3 CPU:
gcc clang
sec1 1.25 ns 1.14 ns
sec2 1.78 ns 1.63 ns
sec3 1.68 ns 1.73 ns
And on sparc64:
gcc
sec1 7.24 ns
sec2 7.51 ns
sec3 7.24 ns
And on aarch64:
gcc
sec1 3.54 ns
sec2 5.00 ns
sec3 4.59 ns
Interesting observations here:
* While on x86_64 and sparc64 the 32-bit division takes
approximately as much time as the 64-bit division, on
aarch64 it is 6 to 11 times faster!
* On x86_64, clang optimizes sec2 better than sec3.
That's a bit paradoxical, because sec2 has an imulq and an imull
instruction, whereas sec3 has only an imulq instruction.
Regarding your fourth patch:
> - (corr_quad + (corr_quad < 0)) / 25 - (corr_quad < 0)
Shouldn't that be parenthesized differently?
- ((corr_quad + (corr_quad < 0)) / 25 - (corr_quad < 0))
Bruno
Re: bug in xtime.h, Akim Demaille, 2019/12/25