octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #62329] Incorrect value (regression) for betai


From: Michael Leitner
Subject: [Octave-bug-tracker] [bug #62329] Incorrect value (regression) for betainc
Date: Wed, 20 Apr 2022 11:06:49 -0400 (EDT)

Follow-up Comment #7, bug #62329 (project octave):

I did some research, and std::pow() seems to not even come from the library,
but typically be provided by the compiler itself (at sufficient optimization
level). Thus, I did not find anything authoritative on how it is really
implemented, and therefore I tested it:

I used

++
a=.001*rand(10000000,1)+1;
--

to have nice doubles that also with high powers do not become sub- or
supranormal (or inf), which could necessitate special treatment. Then it is
just a question of 

++
tic;a.^b;toc
--

for various b. What I found on a Intel i5-2500 with 64-bit Debian is that for
any (also negative) non-integer exponent (of course real) it takes about 180
clock cycles per such power operation, which is perfectly consistent with the
combined cost of exp() and log() (the latter on its own takes about 100
cycles, exp() about 40, and then some multiplying and adding). Integer
exponents of large absolute value are the same. The following cases are
treated differently (more efficiently): exponents of 2 and 3, which take about
15 cycles, exponents of 0 and 1, which take about 45 cycles, and exponent -1
(equivalent in terms of speed to 1./a) takes 40 cycles. This seems a bit much,
as in compiled C programs on my architecture I can do inversion in about 20
cycles. And I have no idea whatsoever how the cases of 0 and 1 are implemented
-- of course both are trivial and should involve no calculation whatsoever,
but on the other hand there is no reason ever to use them. 

To compare, doing a.*a in octave is also about 15 cycles, a.*a.*a
correspondingly 30 cycles. 

Bottom lines: 
- the code path that is used by the x.^y operator (equivalent to the
power(x,y) function) on my architecture results in about a factor of ten more
efficient computation for y==2 or 3 compared to the general case, and thus
also compared to the proposed fix. But the expm1 and log1p is unconditionally
stable, thus I would propose to leave it. 
- it seems that there is an opportunity for improvement of some of the
lowest-level operations in octave, which could be interesting for Rik's
long-term ambition: First, could inversion possibly be sped up? As I see it,
there are not more checks to be performed than for multiplication, because
1./0 is inf automatically, isn't it? But the overhead of octave compared to C
is 20 cycles per operation, more than .^2 takes in total (in octave). And
further, .^ with integer exponents could definitely be made more efficient for
a wide range of exponents: by repeated squaring I can take the power of, e.g.,
127 within 11 cycles. Perhaps sometime I come round to write an oct-file
implementation as demonstration.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?62329>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]