octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization


From: Nicholas Jankowski
Subject: [Octave-bug-tracker] [bug #60818] delaunayn - 2D code path vectorization doesn't match nD algorithm
Date: Wed, 30 Jun 2021 20:18:25 -0400 (EDT)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36

Follow-up Comment #18, bug #60818 (project octave):

looking at timing. It appears that Octave LU calls are just very computational
and memory expensive:

using the profiler to look at a 3D case:


Laplace Expansion:
>> rand("state", [1:625]');
>> x = rand(100,1)*4-2;y = rand(100,1)*4-2; z = rand(100,1)*4-2;
>> profile clear; profile on; for idx = 1:10000, delaunayn([x,y,z]);endfor,
profile off; profshow

   #      Function Attr     Time (s)   Time (%)        Calls
------------------------------------------------------------
   5 __delaunayn__            13.772      78.08        10000
   1     delaunayn             1.899      10.77        10000
  13         cross             1.245       7.06        10000
  12      binary -             0.088       0.50        60000
  19           cat             0.088       0.50        10000
  18     binary .*             0.087       0.50        80000
  22          sqrt             0.080       0.45        30000
  21         sumsq             0.070       0.40        30000
   6           isa             0.044       0.25        10000
  20           dot             0.035       0.20        10000
   3      binary <             0.030       0.17        50000
   9          size             0.024       0.14        30000
   7           eps             0.023       0.13        10000
   2        nargin             0.022       0.13        50001
  15         ndims             0.022       0.12        30000
  10     binary ==             0.019       0.11        30000
  16          ones             0.018       0.10        10000
  23     binary ./             0.017       0.10        10000
  11           any             0.017       0.10        10000
  24           abs             0.013       0.07        10000


using the LU decomposition:

>> clear all
>> rand("state", [1:625]');
>> x = rand(100,1)*4-2;y = rand(100,1)*4-2; z = rand(100,1)*4-2; p =
rand(100,1)*4-2;
q = rand(100,1)*4-2; r = rand(100,1)*4-2; s = rand(100,1)*4-2;
>> profile clear; profile on; for idx = 1:10000, delaunayn([x,y,z]);endfor,
profile of
f; profshow
   #      Function Attr     Time (s)   Time (%)        Calls
------------------------------------------------------------
  23            lu            14.207      40.10        10000
   5 __delaunayn__            12.965      36.59        10000
   1     delaunayn             4.423      12.48        10000
  27        unique             0.783       2.21        10000
  15          kron             0.753       2.13        30000
  22       logical             0.576       1.63        10000
  17         speye             0.368       1.04        10000
  16        sparse             0.218       0.62        20000
  24          diag             0.184       0.52        10000
   3      binary <             0.182       0.51        30000
  25           abs             0.095       0.27        10000
  13      binary -             0.078       0.22        70000
  12    postfix .'             0.078       0.22        20000
  26           max             0.073       0.21        10000
  21          true             0.055       0.16        30000
   7           eps             0.051       0.14        20000
   6           isa             0.038       0.11        10000
  36         zeros             0.036       0.10        10000
  30         false             0.028       0.08        20001
  14          ones             0.028       0.08        20000


so at least in 3D, for 100 points the LU approach is spending just as much
time in LU as in __delaunayn__, plus ~1/2 of __delaunayn__ again on other ops
within delaunayn.  The Laplace expansion avoids most other function calls
except for cross, spending a total of ~1/3 of the time spent in
__delaunayn__.

jumping out to 6D:


LU:

>> clear all
>> x = rand(100,1)*4-2;y = rand(100,1)*4-2; z = rand(100,1)*4-2; p =
rand(100,1)*4-2;q = rand(100,1)*4-2; r = rand(100,1)*4-2;
>> profile clear; profile on; for idx = 1:100,
delaunayn([x,y,z,p,q,r]);endfor, profile off; profshow
   #      Function Attr     Time (s)   Time (%)        Calls
------------------------------------------------------------
  23            lu            26.310      47.92          100
   5 __delaunayn__            12.579      22.91          100
   1     delaunayn             9.997      18.21          100
  15          kron             2.593       4.72          300
  22       logical             1.388       2.53          100
  12    postfix .'             0.857       1.56          200
...


Laplace expansion:

>> clear all
>> x = rand(100,1)*4-2;y = rand(100,1)*4-2; z = rand(100,1)*4-2; p =
rand(100,1)*4-2;q = rand(100,1)*4-2; r = rand(100,1)*4-2;
>> profile clear; profile on; for idx = 1:100,
delaunayn([x,y,z,p,q,r]);endfor, profile off; profshow
   #          Function Attr     Time (s)   Time (%)        Calls
----------------------------------------------------------------
   5     __delaunayn__            14.085      57.97          100
  22         binary .*             1.971       8.11        87600
  17             cross             1.690       6.95        12000
  15 delaunayn>detvec4             1.341       5.52         3000
  12          binary -             1.335       5.49        44100
  14 delaunayn>detvec5             0.910       3.75          600
  24               dot             0.869       3.57        12000
...


not much improvement.


upping to 1000pts:


3D, 1000pts

LU:
   #      Function Attr     Time (s)   Time (%)        Calls
------------------------------------------------------------
  23            lu            28.536      55.89         1000
   5 __delaunayn__            17.235      33.76         1000
   1     delaunayn             2.665       5.22         1000
  15          kron             0.811       1.59         3000
  22       logical             0.674       1.32         1000


Laplace Exp:
   #          Function Attr     Time (s)   Time (%)        Calls
----------------------------------------------------------------
   5     __delaunayn__            17.201      94.61         1000
   1         delaunayn             0.443       2.43         1000
  14             cross             0.143       0.79         1000
  12          binary -             0.081       0.44         6000



6D, 1000pts:

LU decomp:
   #      Function Attr     Time (s)   Time (%)        Calls
------------------------------------------------------------
  23            lu            97.561      40.34           10
   5 __delaunayn__            85.050      35.17           10
   1     delaunayn            35.671      14.75           10
  15          kron             9.233       3.82           30
  22       logical             5.001       2.07           10
  12    postfix .'             3.478       1.44           20

(peak mem about 3GB, total time ~241s)

Laplace expansion:
   #          Function Attr     Time (s)   Time (%)        Calls
----------------------------------------------------------------
   5     __delaunayn__            85.856      35.19           10
  22         binary .*            55.332      22.68         8820
  12          binary -            30.297      12.42         4410
  15 delaunayn>detvec4            20.128       8.25          300
  23               cat            10.416       4.27         1200
  24               dot            10.393       4.26         1200
 
(peak mem about 700MB, total time ~244s)

so here at the cost of 3x the memory, LU decomp breaks even at 6D.


upping to 10000 points.  


3D, 10kpts:
LU:
   #      Function Attr     Time (s)   Time (%)        Calls
------------------------------------------------------------
  23            lu            41.780      47.47          100
   5 __delaunayn__            27.882      31.68          100
   1     delaunayn            10.659      12.11          100
  15          kron             2.891       3.29          300
  22       logical             1.455       1.65          100


Laplace expansion:
   #      Function Attr     Time (s)   Time (%)        Calls
------------------------------------------------------------
   5 __delaunayn__            27.508      91.32          100
   1     delaunayn             1.558       5.17          100
  12      binary -             0.665       2.21          600
  19           cat             0.099       0.33          100
  22          sqrt             0.076       0.25          300
  18     binary .*             0.073       0.24          800

6D:

Laplace expansion: 
   #          Function Attr     Time (s)   Time (%)        Calls
----------------------------------------------------------------
   5     __delaunayn__           195.219      44.54            1
  22         binary .*            87.128      19.88          876
  12          binary -            46.551      10.62          441
  15 delaunayn>detvec4            29.981       6.84           30
  24               dot            16.814       3.84          120

peak memory usage by Octave in the first few minutes hit about 3GB, then 7GB
toward the end

LU:
peak memory usage by Octave in the first few minutes hit about 3GB, then 13GB
toward the end, then 
hit an out of memory limit. 

incomplete profshow:
>> profshow
   #      Function Attr     Time (s)   Time (%)        Calls
------------------------------------------------------------
   5 __delaunayn__           198.244      54.59            1
   1     delaunayn           132.159      36.39            1
  15          kron            13.536       3.73            2
  22       logical             8.555       2.36            1
  12    postfix .'             5.262       1.45            2
  13      binary -             3.891       1.07            5
  16        sparse             1.474       0.41            2
  23       profile             0.011       0.00            1
  24     binary !=             0.002       0.00            1


I think it crashed out during the LU decomp, which is just showing up as
delaunayn time.  Not sure how well that time can be trusted, since the times I
was hitting 97+ memory I saw the disk usage spike as it started trying to
diskswap.  

So in summary, seems like with Octave's LU it starts to be faster at higher
dimensions for larger arrays, but at a big memory cost. 

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?60818>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]