[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bug in xtime.h
From: |
Bruno Haible |
Subject: |
Re: bug in xtime.h |
Date: |
Mon, 23 Dec 2019 09:38:57 +0100 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-166-generic; KDE/5.18.0; x86_64; ; ) |
Hi Paul,
Now you got me hooked :)
> the following
> code should be faster than the other options mentioned, as it should avoid
> both
> conditional branches and compilers' overoptimizations.
>
> return (t + (t < 0)) / 1000000000 - (t < 0);
The assembly code indeed shows no conditional jump and only one 64-bit
multiplication instruction.
=================================================================
long long sec1 (long long t)
{ return (t < 0 ? (t + 1) / 1000000000 - 1 : t / 1000000000); }
long long sec2 (long long t)
{ return t / 1000000000 - (t % 1000000000 < 0); }
long long sec3 (long long t)
{ return (t + (t < 0)) / 1000000000 - (t < 0); }
=================================================================
With gcc:
sec3:
movabsq $1237940039285380275, %rdx
movq %rdi, %rcx
shrq $63, %rcx
addq %rcx, %rdi
movq %rdi, %rax
sarq $63, %rdi
imulq %rdx
movq %rdx, %rax
sarq $26, %rax
subq %rdi, %rax
subq %rcx, %rax
ret
With clang:
sec3:
movq %rdi, %rcx
shrq $63, %rcx
leaq (%rdi,%rcx), %rax
movabsq $1237940039285380275, %rdx
imulq %rdx
movq %rdx, %rax
shrq $63, %rax
sarq $26, %rdx
addq %rdx, %rax
subq %rcx, %rax
retq
And the benchmark:
=================================================================
#include <stdlib.h>
static inline long long sec1 (long long t)
{ return (t < 0 ? (t + 1) / 1000000000 - 1 : t / 1000000000); }
static inline long long sec2 (long long t)
{ return t / 1000000000 - (t % 1000000000 < 0); }
static inline long long sec3 (long long t)
{ return (t + (t < 0)) / 1000000000 - (t < 0); }
volatile long long t = 1576800000000000000LL;
volatile long long x;
int
main (int argc, char *argv[])
{
int repeat = atoi (argv[1]);
int i;
for (i = repeat; i > 0; i--)
x = sec1 (t); // or sec2 (t) or sec3 (t)
}
=================================================================
On an Intel Core m3 CPU:
gcc clang
sec1 1.28 ns 1.03 ns
sec2 1.84 ns 1.79 ns
sec3 1.71 ns 1.62 ns
And on sparc64:
gcc
sec1 7.79 ns
sec2 8.07 ns
sec3 7.93 ns
And on aarch64:
gcc
sec1 27.5 ns
sec2 55.0 ns
sec3 27.5 ns
Interpreting the results:
- aarch64 has a slow 'sdiv' instruction, so slow that conditional
branches don't matter.
- On sparc64, the sdivx instructions takes much more time than the
mulx instruction.
- My micro-benchmark, with 10⁹ repetitions in a row, exploits branch
prediction to an amount that is higher than realistic: In practice,
the branch prediction cache should be filled with other stuff.
This means, the realistic time for sec1 should be higher than what
I measured. How much, I can't really say.
- Since branch prediction does not matter for sec2 and sec3, we can
see that sec3 is always faster than sec2.
=> I'm in favour of installing your new formula.
Bruno
Re: bug in xtime.h, Akim Demaille, 2019/12/25