octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow


From: Rik
Subject: [Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow for DIM=2
Date: Sat, 14 Aug 2021 11:18:37 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36

Follow-up Comment #8, bug #60928 (project octave):

My comments were cut off which is a shame because I had written quite a bit. 
I have no energy to reproduced them so here is a brief summary.

The gather/scatter routine when stride != 1 is shown below (Array.cc:1843)


          // gather and partition out NaNs.
          // FIXME: impact on integer types noticeable?
          octave_idx_type kl = 0;
          octave_idx_type ku = ns;
          for (octave_idx_type i = 0; i < ns; i++)
            {
              T tmp = ov[i*stride + offset];
              if (sort_isnan<T> (tmp))
                buf[--ku] = tmp;
              else
                buf[kl++] = tmp;
            }

          // sort.
          lsort.sort (buf, kl);

          if (ku < ns)
            {
              // NaNs are in reverse order
              std::reverse (buf + ku, buf + ns);
              if (mode == DESCENDING)
                std::rotate (buf, buf + ku, buf + ns);
            }

          // scatter.
          for (octave_idx_type i = 0; i < ns; i++)
            v[i*stride + offset] = buf[i];


There are two potential cache inefficiencies, one at the gather stage and one
at the scatter stage.  If the first dimension is being sorted then the data
values are arranged one after another and there is no calculation necessary to
retrieve the correct values.  Moreover, the scatter phase can be skipped
entirely.  Instead of gathering data into a temporary buffer, sorting the temp
buffer, and then copying data from the temp buffer to the destination the
output memory, the output memory pointer can just be handed to the sort
routine.  That also improves efficiency.

Given that Array<T>::permute is templated code in Array.cc, I think the change
can occur quite easily.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?60928>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]