octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #54572] int64 does not saturate correctly in n


From: Dan Sebald
Subject: [Octave-bug-tracker] [bug #54572] int64 does not saturate correctly in negative direction
Date: Wed, 29 Aug 2018 23:07:20 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0

Follow-up Comment #36, bug #54572 (project octave):

Ah, so that was a reason for some of this, i.e., the 53-bit/64-bit
discrepancy.  Since 2008, I'm guessing you introduced templates, which may
have obviated that reason.

In any case, I've done some tests here with various builds: with and without
the HAVE_FAST_OPS, and a version that uses GCC's __builtin_add_overflow(). 
I've tested the following on each build:


w = int64(ones(5000));
x = w;
y = w;
start = cputime; for i=[1:10]; z = x + y; endfor; tgood = cputime - start
z(1:2)
x = w; x(:) = intmax('int64');
y = w;
start = cputime; for i=[1:10]; z = x + y; endfor; tover = cputime - start
z(1:2)
tover / tgood
clear w x y z

w = int32(ones(5000));
x = w;
y = w;
start = cputime; for i=[1:10]; z = x + y; endfor; tgood = cputime - start
z(1:2)
x = w; x(:) = intmax('int32');
y = w;
start = cputime; for i=[1:10]; z = x + y; endfor; tover = cputime - start
z(1:2)
tover / tgood
clear w x y z

w = int16(ones(5000));
x = w;
y = w;
start = cputime; for i=[1:10]; z = x + y; endfor; tgood = cputime - start
z(1:2)
x = w; x(:) = intmax('int16');
y = w;
start = cputime; for i=[1:10]; z = x + y; endfor; tover = cputime - start
z(1:2)
tover / tgood
clear w x y z

w = int8(ones(5000));
x = w;
y = w;
start = cputime; for i=[1:10]; z = x + y; endfor; tgood = cputime - start
z(1:2)
x = w; x(:) = intmax('int8');
y = w;
start = cputime; for i=[1:10]; z = x + y; endfor; tover = cputime - start
z(1:2)
tover / tgood
clear w x y z


And here is the result:


64-BIT ADD
       WITH_FAST_INT   NO_FAST_INT   GCC_BUILTINS

tgood     2.5840         2.6480         2.6320
tover     3.2720         3.3240         3.2840
to/tg     1.2663         1.2553         1.2477

32-BIT ADD
       WITH_FAST_INT   NO_FAST_INT   GCC_BUILTINS

tgood     1.2960         1.3880         1.3000
tover     1.6920         1.7080         1.7240
to/tg     1.3056         1.2305         1.3262

16-BIT ADD
       WITH_FAST_INT   NO_FAST_INT   GCC_BUILTINS

tgood     0.73200        0.81200        0.73600
tover     0.98000        1.0720         1.4160
to/tg     1.3388         1.3202         1.9239

8-BIT ADD
       WITH_FAST_INT   NO_FAST_INT   GCC_BUILTINS

tgood     0.42000        0.43600        0.38400
tover     0.70000        0.70000        0.77200
to/tg     1.6667         1.6055         2.0104


That aberration  of 1.41 seconds for GCC_BUILTINS and 16-bit adds is not due
to system stall.  It actually is some inefficiency in the builtin routine for
that particular data width.  I should point out that I used the generic
(builtin_add_overflow) builtin overflow routine, not the ones specialized to
long long etc.--I couldn't figure out a way for the template T to be "long
long" or "int64" mapped to the appropriate type.  It's interesting to see the
CPU features at work as well with the decreasing times with reduced data width
(Xeon/x86).

CONCLUSION: The GCC compiler seems to be so good at optimizing now that the
HAVE_FAST_INTS and __builtin_add_overflow() don't seem necessary; at least not
for GCC.  I could run without the -O2 flag and redo the tests, but I'm not
interested enough.

--

The following is the same test but using "tic/toc", pretty much the same
result; no sense looking at it:


w = int64(ones(5000));
x = w;
y = w;
tic; for i=[1:10]; x + y; endfor; toc
x = w; x(:) = intmax('int64');
y = w;
tic; for i=[1:10]; x + y; endfor; toc
clear w x y z

w = int32(ones(5000));
x = w;
y = w;
tic; for i=[1:10]; x + y; endfor; toc
x = w; x(:) = intmax('int32');
y = w;
tic; for i=[1:10]; x + y; endfor; toc
clear w x y z

w = int16(ones(5000));
x = w;
y = w;
tic; for i=[1:10]; x + y; endfor; toc
x = w; x(:) = intmax('int16');
y = w;
tic; for i=[1:10]; x + y; endfor; toc
clear w x y z

w = int8(ones(5000));
x = w;
y = w;
tic; for i=[1:10]; x + y; endfor; toc
x = w; x(:) = intmax('int8');
y = w;
tic; for i=[1:10]; x + y; endfor; toc
clear w x y z



64-BIT ADD
       WITH_FAST_INT   NO_FAST_INT   GCC_BUILTINS

tgood     2.60085        2.60146       2.61581
tover     3.30584        3.28829       3.2446

32-BIT ADD
       WITH_FAST_INT   NO_FAST_INT   GCC_BUILTINS

tgood     1.34028        1.36915       1.3331
tover     1.70602        1.71057       1.66417

16-BIT ADD
       WITH_FAST_INT   NO_FAST_INT   GCC_BUILTINS

tgood     0.781169       0.800373      0.768512
tover     0.996119       0.9839        1.37523

8-BIT ADD
       WITH_FAST_INT   NO_FAST_INT   GCC_BUILTINS

tgood     0.430963       0.435878      0.403436
tover     0.739937       0.737365      0.766774


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?54572>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]